Show HN: Attaching to a Virtual GPU over TCP
Thunder Compute provides a flexible, cost-efficient cloud-based GPU service with instant scaling, pay-per-use billing, high utilization rates, and strong security, benefiting enterprises by minimizing idle GPU time.
Read original articleThunder Compute offers a cloud-based GPU service designed for flexibility and cost efficiency. Users can scale their GPU usage up or down instantly and are only billed for what they use, eliminating the need for long-term reservations that often lead to wasted resources. The platform allows developers to run existing code without modifications, easily switching from CPU to GPU with a single command. Thunder Compute aims to maximize GPU utilization, reportedly achieving over five times the efficiency of other cloud providers, which typically see GPUs used only 15% of the time. The service is designed to be serverless, removing concerns about configuration, quotas, or idle time. Data security is prioritized, with end-to-end encryption and no data storage by Thunder. This model is particularly beneficial for enterprises looking to reduce cloud expenses by minimizing idle GPU time.
- Thunder Compute allows instant scaling of GPU usage with pay-per-use billing.
- Users can run existing code without changes and switch between CPU and GPU easily.
- The platform claims to achieve over five times the GPU utilization compared to competitors.
- Security measures include end-to-end encryption and no data storage.
- The service is designed to eliminate concerns about configuration and idle resources.
Related
Show HN: We made glhf.chat – run almost any open-source LLM, including 405B
The platform allows running various large language models via Hugging Face repo links using vLLM and GPU scheduler. Offers free beta access with plans for competitive pricing post-beta using multi-tenant model running.
Ask HN: Would you use a shared GPU cloud tier?
A new cloud instance model allows users to pay only for active GPU usage, but tasks may take about 25% longer than on dedicated GPU instances due to shared resources.
YC closes deal with Google for dedicated compute cluster for AI startups
Google Cloud has launched a dedicated Nvidia GPU and TPU cluster for Y Combinator startups, offering $350,000 in cloud credits and support to enhance AI development and innovation.
TPU transformation: A look back at 10 years of our AI-specialized chips
Google has advanced its AI capabilities with Tensor Processing Units (TPUs), specialized chips for AI workloads, enhancing performance and efficiency, and making them available through Cloud services for external developers.
GPU Restaking – Beyond digital currencies to physical computing resources
GPU Restaking, developed by Bagel, enables simultaneous use of locked GPUs across platforms, promoting a transparent marketplace, direct negotiations, and maximizing value through ownership verification and economic game theory.
- Users express interest in the service's potential for various applications, including video transcoding and machine learning.
- Concerns are raised about network bottlenecks and the efficiency of data transfer, especially for large datasets.
- Several commenters inquire about self-hosting options and compatibility with existing hardware.
- There is a desire for clarity on the pricing model and how the service operates technically.
- Some users are excited about the potential for using the service in gaming and other creative applications.
So I wrote https://github.com/steelbrain/ffmpeg-over-ip and had the server running in the windows machine and the client in the media server (could be plex, emby, jellyfin etc) and it worked flawlessly.
I am not sure how useful it was in reality(usually if you had a nice graphics card you also had a nice cpu) but I had fun playing around with it. There was something fascinating about getting accelerated graphics on a program running in the machine room. I was able to get glquake running like this once.
But hey i'm happy to be proofed wrong ;)
Does anyone know if this is possible with USB?
I have a Davinci Resolve license USB-dongle I'd like to not plugging into my laptop.
Hmm... well I just watched you run nvidia-smi in a Mac terminal, which is a platform it's explicitly not supported on. My instant assumption is that your tool copies my code into a private server instance and communicates back and forth to run the commands.
Does this platform expose eGPU capabilities if my host machine supports it? Can I run raster workloads or network it with my own CUDA hardware? The actual way your tool and service connects isn't very clear to me and I assume other developers will be confused too.
this is awesome. can it do 3d rendering (vulkan/opengl)
Another solution is qCUDA [3] which is more specialized towards CUDA.
In addition to these solutions, various virtualization solutions today provide some sort of serialization mechanism for GPU commands, so they can be transferred to another host (or process). [4]
One example is the QEMU-based Android Emulator. It is using special translator libraries and a "QEMU Pipe" to efficiently communicate GPU commands from the virtualized Android OS to the host OS [5].
The new Cuttlefish Android emulator [6] uses Gallium3D for transport and the virglrenderer library [7].
I'd expect that the current virtio-gpu implementation in QEMU [8] might make this job even easier, because it includes the Android's gfxstream [9] (formerly called "Vulkan Cereal") that should already support communication over network sockets out of the box.
[1] https://github.com/pocl/pocl
[2] https://portablecl.org/docs/html/remote.html
[3] https://github.com/coldfunction/qCUDA
[4] https://www.linaro.org/blog/a-closer-look-at-virtio-and-gpu-...
[5] https://android.googlesource.com/platform/external/qemu/+/em...
[6] https://source.android.com/docs/devices/cuttlefish/gpu
[7] https://cs.android.com/android/platform/superproject/main/+/...
[8] https://www.qemu.org/docs/master/system/devices/virtio-gpu.h...
[9] https://android.googlesource.com/platform/hardware/google/gf...
Related
Show HN: We made glhf.chat – run almost any open-source LLM, including 405B
The platform allows running various large language models via Hugging Face repo links using vLLM and GPU scheduler. Offers free beta access with plans for competitive pricing post-beta using multi-tenant model running.
Ask HN: Would you use a shared GPU cloud tier?
A new cloud instance model allows users to pay only for active GPU usage, but tasks may take about 25% longer than on dedicated GPU instances due to shared resources.
YC closes deal with Google for dedicated compute cluster for AI startups
Google Cloud has launched a dedicated Nvidia GPU and TPU cluster for Y Combinator startups, offering $350,000 in cloud credits and support to enhance AI development and innovation.
TPU transformation: A look back at 10 years of our AI-specialized chips
Google has advanced its AI capabilities with Tensor Processing Units (TPUs), specialized chips for AI workloads, enhancing performance and efficiency, and making them available through Cloud services for external developers.
GPU Restaking – Beyond digital currencies to physical computing resources
GPU Restaking, developed by Bagel, enables simultaneous use of locked GPUs across platforms, promoting a transparent marketplace, direct negotiations, and maximizing value through ownership verification and economic game theory.