Kompute – Vulkan Alternative to CUDA
The GitHub repository for the Kompute project offers detailed insights into its principles, features, and examples. It covers Python modules, asynchronous processing, mobile optimization, memory management, unit testing, and more.
Read original articleThe GitHub repository for the Kompute project contains detailed information on its principles, features, architectural overview, and examples. It covers topics such as flexible Python modules, asynchronous processing, mobile optimization, memory management, unit testing, and advanced use-cases. Additionally, it includes a list of projects using Kompute, GPU multiplication examples, interactive notebooks, hands-on videos, and an explanation of its core components. The project emphasizes asynchronous and parallel operations, mobile compatibility, and provides various examples for different use cases. Details about the Python package, C++ build system using CMake, development guidelines, and motivations behind the project are also available. This comprehensive resource serves as a guide for understanding Kompute's capabilities and how to start utilizing it for GPU compute tasks.
Related
Neko: Portable framework for high-order spectral element flow simulations
A portable framework named "Neko" for high-order spectral element flow simulations in modern Fortran. Object-oriented, supports various hardware, with detailed documentation, cloning guidelines, publications, and development acknowledgments. Additional support available for inquiries.
Komorebi: Tiling Window Management for Windows
The "komorebi" project is a tiling window manager for Windows, extending Microsoft's Desktop Window Manager. It offers CLI control, installation guides, configuration details, and a supportive community for contributions and discussions.
Karpathy: Let's reproduce GPT-2 (1.6B): one 8XH100 node 24h $672 in llm.c
The GitHub repository focuses on the "llm.c" project by Andrej Karpathy, aiming to implement Large Language Models in C/CUDA without extensive libraries. It emphasizes pretraining GPT-2 and GPT-3 models.
gpu.cpp: A lightweight library for portable low-level GPU computation
The GitHub repository features gpu.cpp, a lightweight C++ library for portable GPU compute using WebGPU. It offers fast cycles, minimal dependencies, and examples like GELU kernel and matrix multiplication for easy integration.
Nvidia Warp (a Python framework for writing high-performance code)
Warp is a Python framework for high-performance simulation and graphics, compiling functions for CPU or GPU. It supports spatial computing, differentiable kernels, PyTorch, JAX, and USD file generation. Installation options include PyPI and CUDA.
But some of the caveats for compute applications are currently:
- No bfloat16 in shaders
- No shader work graphs (GPU-driven shader control flow)
- No inline PTX (inline GCN/RDNA/GEN is available)
These may or may not be important to you. Vulkan recently gained an ability to seamlessly dispatch CUDA kernels if you need these in some places, but there aren't currently similar Vulkan extensions for HIP.
It may, _in principle_, have been developed - with much more work than has gone into it - into such an alternative; but I am actually not sure of that since I have poor command of Vulcan. I got suspicious being someone who maintains C++ API wrappers for CUDA myself [2], and know that just doing that is a lot more code and a lot more work.
[1] - I assume it is opinionated to cater to CNN simulation for large language models, and basically not much more.
Does anyone have real world experience using Vulkan compute shaders versus, say, OpenCL? Does Kompute make things as straightforward as it seems?
Until then they are wannabe alternatives, for a subset of use cases, with lesser tooling.
It always feels like those proposing CUDA alternatives don't understand what they are trying to replace, and that is already the first error.
If you are interested to learn more, do join the community through our discord here: https://discord.gg/MaH5Jv5zwv
For some background, this project started after seeing various renowned machine learning frameworks like Pytorch and Tensorflow integrating Vulkan as a backend. The Vulkan SDK offers a great low level interface that enables for highly specialized optimizations - however it comes at a cost of highly verbose code which requires 800-2000 lines of code to even begin writing application code. This has resulted in each of these projects having to implement the same baseline to abstract the non-compute related features of the Vulkan SDK.
This large amount of non-standardised boiler-plate can result in limited knowledge transfer, higher chance of unique framework implementation bugs being introduced, etc. We are aiming to address this with Kompute. As of today, we are now part of the Linux Foundation, and slowly contributing to the cross-vendor GPGPU revolution.
Some of the key features / highlights of Kompute:
* C++ SDK with Flexible Python Package * BYOV: Bring-your-own-Vulkan design to play nice with existing Vulkan applications * Asynchronous & parallel processing support through GPU family queues * Explicit relationships for GPU and host memory ownership and memory management: https://kompute.cc/overview/memory-management.html * Robust codebase with 90% unit test code coverage: https://kompute.cc/codecov/ * Mobile enabled via Android NDK across several architectures
Relevant blog posts:
Machine Learning: https://towardsdatascience.com/machine-learning-and-data-pro...
Mobile development: https://towardsdatascience.com/gpu-accelerated-machine-learn...
Game development (we need to update to Godot4): https://towardsdatascience.com/supercharging-game-developmen...
Related
Neko: Portable framework for high-order spectral element flow simulations
A portable framework named "Neko" for high-order spectral element flow simulations in modern Fortran. Object-oriented, supports various hardware, with detailed documentation, cloning guidelines, publications, and development acknowledgments. Additional support available for inquiries.
Komorebi: Tiling Window Management for Windows
The "komorebi" project is a tiling window manager for Windows, extending Microsoft's Desktop Window Manager. It offers CLI control, installation guides, configuration details, and a supportive community for contributions and discussions.
Karpathy: Let's reproduce GPT-2 (1.6B): one 8XH100 node 24h $672 in llm.c
The GitHub repository focuses on the "llm.c" project by Andrej Karpathy, aiming to implement Large Language Models in C/CUDA without extensive libraries. It emphasizes pretraining GPT-2 and GPT-3 models.
gpu.cpp: A lightweight library for portable low-level GPU computation
The GitHub repository features gpu.cpp, a lightweight C++ library for portable GPU compute using WebGPU. It offers fast cycles, minimal dependencies, and examples like GELU kernel and matrix multiplication for easy integration.
Nvidia Warp (a Python framework for writing high-performance code)
Warp is a Python framework for high-performance simulation and graphics, compiling functions for CPU or GPU. It supports spatial computing, differentiable kernels, PyTorch, JAX, and USD file generation. Installation options include PyPI and CUDA.