August 9th, 2024

Show HN: Attaching to a Virtual GPU over TCP

Thunder Compute provides a flexible, cost-efficient cloud-based GPU service with instant scaling, pay-per-use billing, high utilization rates, and strong security, benefiting enterprises by minimizing idle GPU time.

Read original article

InterestConfusionExcitement

Show HN: Attaching to a Virtual GPU over TCP

Thunder Compute offers a cloud-based GPU service designed for flexibility and cost efficiency. Users can scale their GPU usage up or down instantly and are only billed for what they use, eliminating the need for long-term reservations that often lead to wasted resources. The platform allows developers to run existing code without modifications, easily switching from CPU to GPU with a single command. Thunder Compute aims to maximize GPU utilization, reportedly achieving over five times the efficiency of other cloud providers, which typically see GPUs used only 15% of the time. The service is designed to be serverless, removing concerns about configuration, quotas, or idle time. Data security is prioritized, with end-to-end encryption and no data storage by Thunder. This model is particularly beneficial for enterprises looking to reduce cloud expenses by minimizing idle GPU time.

- Thunder Compute allows instant scaling of GPU usage with pay-per-use billing.

- Users can run existing code without changes and switch between CPU and GPU easily.

- The platform claims to achieve over five times the GPU utilization compared to competitors.

- Security measures include end-to-end encryption and no data storage.

- The service is designed to eliminate concerns about configuration and idle resources.

Show HN: We made glhf.chat – run almost any open-source LLM, including 405B

The platform allows running various large language models via Hugging Face repo links using vLLM and GPU scheduler. Offers free beta access with plans for competitive pricing post-beta using multi-tenant model running.

Ask HN: Would you use a shared GPU cloud tier?

A new cloud instance model allows users to pay only for active GPU usage, but tasks may take about 25% longer than on dedicated GPU instances due to shared resources.

YC closes deal with Google for dedicated compute cluster for AI startups

Google Cloud has launched a dedicated Nvidia GPU and TPU cluster for Y Combinator startups, offering $350,000 in cloud credits and support to enhance AI development and innovation.

TPU transformation: A look back at 10 years of our AI-specialized chips

Google has advanced its AI capabilities with Tensor Processing Units (TPUs), specialized chips for AI workloads, enhancing performance and efficiency, and making them available through Cloud services for external developers.

GPU Restaking – Beyond digital currencies to physical computing resources

GPU Restaking, developed by Bagel, enables simultaneous use of locked GPUs across platforms, promoting a transparent marketplace, direct negotiations, and maximizing value through ownership verification and economic game theory.

AI: What people are saying

The comments on the Thunder Compute article reveal a mix of curiosity and skepticism regarding the cloud-based GPU service.

Users express interest in the service's potential for various applications, including video transcoding and machine learning.
Concerns are raised about network bottlenecks and the efficiency of data transfer, especially for large datasets.
Several commenters inquire about self-hosting options and compatibility with existing hardware.
There is a desire for clarity on the pricing model and how the service operates technically.
Some users are excited about the potential for using the service in gaming and other creative applications.

31 comments

By @steelbrain - 9 months

Ah this is quite interesting! I had a usecase where I needed a GPU-over-IP but only for transcoding videos. I had a not-so-powerful AMD GPU in my homelab server that somehow kept crashing the kernel any time I tried to encode videos with it and also an NVIDIA RTX 3080 in a gaming machine.

So I wrote https://github.com/steelbrain/ffmpeg-over-ip and had the server running in the windows machine and the client in the media server (could be plex, emby, jellyfin etc) and it worked flawlessly.

By @radarsat1 - 9 months

I'm confused, if this operates at the CPU/GPU boundary doesn't it create a massive I/O bottleneck for any dataset that doesn't fit into VRAM? I'm probably misunderstanding how it works but if it intercepts GPU i/o then it must stream your entire dataset on every epoch to a remote machine, which sounds wasteful, probably I'm not getting this right.

By @dishsoap - 9 months

For anyone curious about how this actually works, it looks like a library is injected into your process to hook these functions [1] in order to forward them to the service.

[1] https://pastebin.com/raw/kCYmXr5A

By @Cieric - 9 months

This is interesting, but I'm more interested in self-hosting. I already have a lot of GPUs (some running some not.) Does this have a self-hosting option so I can use the GPUs I already have?

By @doctorpangloss - 9 months

I don't get it. Why would I start an instance in ECS, to use your GPUs in ECS, when I could start an instance for the GPUs I want in ECS? Separately, why would I want half of Nitro, instead of real Nitro?

By @cpeterson42 - 9 months

Given the interest here we decided to open up T4 instances for free. Would love for y'all to try it and let us know your thoughts!

By @tptacek - 9 months

This is neat. Were you able to get MIG or vGPUs working with it?

By @mmsc - 9 months

What's it like to actually use this for any meaningful throughput? Can this be used for hash cracking? Every time I think about virtual GPUs over a network, I think about botnets. Specifically from https://www.hpcwire.com/2012/12/06/gpu_monster_shreds_passwo... "Gosney first had to convince Mosix co-creator Professor Amnon Barak that he was not going to “turn the world into a giant botnet.”"

By @somat - 9 months

What makes me sad is that the original sgi engineers who developed glx were very careful to use x11 mechanisms for the gpu transport, so it was fairly trivial to send the gl stream over the network to render on your graphics card. "run on the supercomputer down the hall, render on your workstation". More recent driver development has not shown such care and this is usually no longer possible.

I am not sure how useful it was in reality(usually if you had a nice graphics card you also had a nice cpu) but I had fun playing around with it. There was something fascinating about getting accelerated graphics on a program running in the machine room. I was able to get glquake running like this once.

By @userbinator - 9 months

It's impressive that this is even possible, but I wonder what happens if the network connection goes down or is anything but 100% stable? In my experience drivers react badly to even a local GPU that isn't behaving.

By @orsorna - 9 months

So what exactly is the pricing model? Do I need a quote? Because otherwise I don't see how to determine it without creating an account which is needlessly gatekeeping.

By @delijati - 9 months

Even a directly attached eGPU via thunderbold 4 was after some time too slow for machine learning aka training. As i work now fully remote i just have a beefy midi tower. Some context about eGPU [1].

But hey i'm happy to be proofed wrong ;)

[1] https://news.ycombinator.com/item?id=38890182#38905888

By @ellis0n - 9 months

In 2008, I had a powerful server with XEON CPU, but the motherboard had no slots for a graphics card. I also had a computer with a powerful graphics card but a weak Core 2 Duo. I had the idea of passing the graphics card over the network using Linux drivers. This concept has now been realized in this project. Good job!

By @kawsper - 9 months

Cool idea, nice product page!

Does anyone know if this is possible with USB?

I have a Davinci Resolve license USB-dongle I'd like to not plugging into my laptop.

By @the_reader - 9 months

Would be possible to mix it with Blender?

By @teaearlgraycold - 9 months

This could be perfect for us. We need very limited bandwidth but have high compute needs.

By @xyst - 9 months

Exciting. But would definitely like to see a self hosted option.

By @talldayo - 9 months

> Access serverless GPUs through a simple CLI to run your existing code on the cloud while being billed precisely for usage

Hmm... well I just watched you run nvidia-smi in a Mac terminal, which is a platform it's explicitly not supported on. My instant assumption is that your tool copies my code into a private server instance and communicates back and forth to run the commands.

Does this platform expose eGPU capabilities if my host machine supports it? Can I run raster workloads or network it with my own CUDA hardware? The actual way your tool and service connects isn't very clear to me and I assume other developers will be confused too.

By @rubatuga - 9 months

What ML packages do you support? In the comments below it says you do not support Vulkan or OpenGL. Does this support AMD GPUs as well?

By @winecamera - 9 months

I saw that in the tnr CLI, there are hints of an option to self-host a GPU. Is this going to be a released feature?

By @tamimio - 9 months

I’m more interested in using tools like hashcat, any benchmark on these? As the docs link returns error.

By @m3kw9 - 9 months

So won’t that make the network the prohibitive bottle neck? Your memory bandwidth is 1gbps max

By @billconan - 9 months

is this a remote nvapi?

this is awesome. can it do 3d rendering (vulkan/opengl)

By @throwaway888abc - 9 months

Does it work for gaming on windows ? or even linux ?

By @test20240809 - 9 months

pocl (Portable Computing Language) [1] provides a remote backend [2] that allows for serialization and forwarding of OpenCL commands over a network.

Another solution is qCUDA [3] which is more specialized towards CUDA.

In addition to these solutions, various virtualization solutions today provide some sort of serialization mechanism for GPU commands, so they can be transferred to another host (or process). [4]

One example is the QEMU-based Android Emulator. It is using special translator libraries and a "QEMU Pipe" to efficiently communicate GPU commands from the virtualized Android OS to the host OS [5].

The new Cuttlefish Android emulator [6] uses Gallium3D for transport and the virglrenderer library [7].

I'd expect that the current virtio-gpu implementation in QEMU [8] might make this job even easier, because it includes the Android's gfxstream [9] (formerly called "Vulkan Cereal") that should already support communication over network sockets out of the box.

[1] https://github.com/pocl/pocl

[2] https://portablecl.org/docs/html/remote.html

[3] https://github.com/coldfunction/qCUDA

[4] https://www.linaro.org/blog/a-closer-look-at-virtio-and-gpu-...

[5] https://android.googlesource.com/platform/external/qemu/+/em...

[6] https://source.android.com/docs/devices/cuttlefish/gpu

[7] https://cs.android.com/android/platform/superproject/main/+/...

[8] https://www.qemu.org/docs/master/system/devices/virtio-gpu.h...

[9] https://android.googlesource.com/platform/hardware/google/gf...

By @Zambyte - 9 months

Reminds me of Plan9 :)

By @bkitano19 - 9 months

this is nuts

By @cpeterson42 - 9 months

We created a discord for the latest updates, bug reports, feature suggestions, and memes. We will try to respond to any issues and suggestions as quickly as we can! Feel free to join here: https://discord.gg/nwuETS9jJK

Show HN: We made glhf.chat – run almost any open-source LLM, including 405B

Ask HN: Would you use a shared GPU cloud tier?

A new cloud instance model allows users to pay only for active GPU usage, but tasks may take about 25% longer than on dedicated GPU instances due to shared resources.

YC closes deal with Google for dedicated compute cluster for AI startups

Google Cloud has launched a dedicated Nvidia GPU and TPU cluster for Y Combinator startups, offering $350,000 in cloud credits and support to enhance AI development and innovation.

Show HN: Attaching to a Virtual GPU over TCP

Related

Show HN: We made glhf.chat – run almost any open-source LLM, including 405B

Ask HN: Would you use a shared GPU cloud tier?

YC closes deal with Google for dedicated compute cluster for AI startups

TPU transformation: A look back at 10 years of our AI-specialized chips

GPU Restaking – Beyond digital currencies to physical computing resources

Related

Show HN: We made glhf.chat – run almost any open-source LLM, including 405B

Ask HN: Would you use a shared GPU cloud tier?

YC closes deal with Google for dedicated compute cluster for AI startups

TPU transformation: A look back at 10 years of our AI-specialized chips

GPU Restaking – Beyond digital currencies to physical computing resources