July 13th, 2024

gpu.cpp: A lightweight library for portable low-level GPU computation

The GitHub repository features gpu.cpp, a lightweight C++ library for portable GPU compute using WebGPU. It offers fast cycles, minimal dependencies, and examples like GELU kernel and matrix multiplication for easy integration.

Read original article

gpu.cpp: A lightweight library for portable low-level GPU computation

The GitHub repository at the provided URL contains information about gpu.cpp, a lightweight library focusing on portable GPU compute using C++. It utilizes the WebGPU specification for a low-level GPU interface, aiming for a high-power-to-weight ratio API with fast compile/run cycles and minimal dependencies. Developers can easily integrate GPU computation into their projects using standard C++ compilers, benefiting from a small API surface area and a prebuilt binary of the Dawn native WebGPU implementation. The library includes examples like GELU kernel, matrix multiplication, physics simulation, and signed distance function rendering, catering to projects needing portable on-device GPU computation with low implementation complexity.

20x Faster Background Removal in the Browser Using ONNX Runtime with WebGPU

Using ONNX Runtime with WebGPU and WebAssembly in browsers achieves 20x speedup for background removal, reducing server load, enhancing scalability, and improving data security. ONNX models run efficiently with WebGPU support, offering near real-time performance. Leveraging modern technology, IMG.LY aims to enhance design tools' accessibility and efficiency.

A portable lightweight C FFI for Lua, based on libffi

A GitHub repository offers a portable lightweight C FFI for Lua, based on libffi. It aims for LuaJIT FFI compatibility, developed in C. Includes features, examples, basic types, build instructions, testing, and acknowledgements.

Show HN: UNet diffusion model in pure CUDA

The GitHub content details optimizing a UNet diffusion model in C++/CUDA to match PyTorch's performance. It covers custom convolution kernels, forward pass improvements, backward pass challenges, and future optimization plans.

GPU profiling for WebGPU workloads on Windows with Chrome

Challenges of GPU profiling for WebGPU in Chrome on Windows are addressed. A workaround using a custom DLL enables GPU profiling with tools like AMD's Radeon GPU Profiler and Nvidia's Nsight, enhancing performance metrics for WebGPU applications.

Karpathy: Let's reproduce GPT-2 (1.6B): one 8XH100 node 24h $672 in llm.c

The GitHub repository focuses on the "llm.c" project by Andrej Karpathy, aiming to implement Large Language Models in C/CUDA without extensive libraries. It emphasizes pretraining GPT-2 and GPT-3 models.

16 comments

By @pavlov - 10 months

Lovely! I like how the API is in a single header file that you can read through and understand in one sitting.

I've worked with OpenGL and Direct3D and Metal in the past, but the pure compute side of GPUs is mostly foreign to me. Learning CUDA always felt like a big time investment when I never had an obvious need at hand.

So I'm definitely going to play with library and try to get up to speed. Thanks for publishing it.

By @0xf00ff00f - 10 months

This is cool, but they should have just used Vulkan. Dawn is a massive dependency (and a PITA to build, in my experience) to get what's basically a wrapper around Vulkan. Vulkan has a reputation for being difficult to work with, but if you just want to use a compute queue it's not that horrible. Also, since Vulkan uses SPIR-V, the user would have more choices for shading languages. Additionally, with RenderDoc you get source-level shader debugging.

Shameless plug: in case anyone wants to see how doing just compute with Vulkan looks like, I wrote a similar library to compete on SHAllenge [0], which was posted here on HN a few days ago. My library is here: https://github.com/0xf00ff00f/vulkan-compute-playground/

[0] https://shallenge.quirino.net/

By @austinvhuang - 10 months

Hi, author here! Agh I was intending for the project to fly under the radar for a few more days before making the announcement and blog post (please look/upvote that when you see it haha :)

But since this is starting I'm happy to chat. Nice to see the interest here!

By @almostgotcaught - 10 months

TIL you can run the WebGPU runtime without a browser.

By @jph00 - 10 months

We just published an article introducing gpu.cpp, what it's for, and how it works:

https://www.answer.ai/posts/2024-07-11--gpu-cpp.html

By @soci - 10 months

I watched the video mentioned in the post [1], but now I’m more confused than before…

What are the benefits, if any, of using gpu.cpp instead of just webgpu.h (webgpu native) directly? Maybe each is tailored for different use cases?

[1] https://youtu.be/qHrx41aOTUQ?si=CehJnYQWCg3XklHj

By @uLogMicheal - 10 months

This is awesome! Was looking at creating similar, inspired by the miniaudio approach. Will likely contribute a dart wrapper soon.

By @hpen - 10 months

Any performance metrics vs Vulkan, metal, etc?

By @captaincrowbar - 10 months

This looks useful but I'm worried about portability. Are there any plans for native Windows support?

By @Arech - 10 months

Very interesting... I wonder, how does code performance compares to raw Vulkan?

By @coffeeaddict1 - 10 months

Is this intended to integrate well in an existing WebGPU project?

By @apatheticonion - 10 months

Oh nice! Would love to see a Rust crate wrapping bindings for this

By @01HNNWZ0MV43FF - 10 months

> The only library dependency of gpu.cpp is a WebGPU implementation.

Noo

By @kookamamie - 10 months

Portable, as in Windows native is not supported?

By @byefruit - 10 months

This looks great. Is there an equivalent project in rust?

gpu.cpp: A lightweight library for portable low-level GPU computation

Related

20x Faster Background Removal in the Browser Using ONNX Runtime with WebGPU

A portable lightweight C FFI for Lua, based on libffi

Show HN: UNet diffusion model in pure CUDA

GPU profiling for WebGPU workloads on Windows with Chrome

Karpathy: Let's reproduce GPT-2 (1.6B): one 8XH100 node 24h $672 in llm.c

Related

20x Faster Background Removal in the Browser Using ONNX Runtime with WebGPU

A portable lightweight C FFI for Lua, based on libffi

Show HN: UNet diffusion model in pure CUDA

GPU profiling for WebGPU workloads on Windows with Chrome

Karpathy: Let's reproduce GPT-2 (1.6B): one 8XH100 node 24h $672 in llm.c