August 16th, 2024

We're Cutting L40S Prices in Half

Fly.io has reduced L40S GPU prices to $1.25 per hour, targeting developers for AI workloads. The L40S offers A100-like performance, focusing on inference tasks and integrating with fast networking and storage.

Read original articleLink Icon
We're Cutting L40S Prices in Half

Fly.io has announced a significant price reduction for its NVIDIA L40S GPUs, now available at $1.25 per hour. This move aims to enhance accessibility for developers seeking GPU-accelerated AI workloads. The company offers a range of NVIDIA GPUs, with the A10 being the most popular despite its older technology, as it meets the needs for random inference tasks effectively. The L40S, an AI-optimized version of the L40, is designed for data center use and offers performance comparable to the A100, making it a cost-effective option for users. Fly.io's strategy reflects a shift in user demand, with a focus on inference rather than training workloads, which require different performance characteristics. The L40S is positioned as a versatile solution for various applications, including large language models, generative AI, and even gaming. Fly.io emphasizes the importance of combining GPU power with fast networking and storage solutions to optimize performance for real-time applications. The company encourages users to take advantage of the new pricing and capabilities of the L40S to innovate in their projects.

- Fly.io has reduced the price of L40S GPUs to $1.25 per hour.

- The A10 GPU remains the most popular among users for its efficiency in inference tasks.

- The L40S offers performance comparable to the A100, targeting cost-sensitive developers.

- The shift in demand is towards inference workloads rather than training jobs.

- Fly.io promotes the integration of GPU power with fast networking and storage for optimal performance.

Related

Intel's Gaudi 3 will cost half the price of Nvidia's H100

Intel's Gaudi 3 will cost half the price of Nvidia's H100

Intel's Gaudi 3 AI processor is priced at $15,650, half of Nvidia's H100. Intel aims to compete in the AI market dominated by Nvidia, facing challenges from cloud providers' custom AI processors.

Fly.io initiates Region-specific Machines pricing

Fly.io initiates Region-specific Machines pricing

Fly.io is changing pricing for Machines service to region-specific rates over four months, starting in August and settling in November. Users will see per region charges on invoices, with no immediate changes in July. Concerns raised about price hikes, acknowledged display issues, and ongoing talks about commitment discounts.

Show HN: We made glhf.chat – run almost any open-source LLM, including 405B

Show HN: We made glhf.chat – run almost any open-source LLM, including 405B

The platform allows running various large language models via Hugging Face repo links using vLLM and GPU scheduler. Offers free beta access with plans for competitive pricing post-beta using multi-tenant model running.

Four co's are hoarding billions worth of Nvidia GPU chips. Meta has 350K of them

Four co's are hoarding billions worth of Nvidia GPU chips. Meta has 350K of them

Meta has launched Llama 3.1, a large language model outperforming ChatGPT 4o on some benchmarks. The model's development involved significant investment in Nvidia GPUs, reflecting high demand for AI training resources.

Nvidia NVLink and Nvidia NVSwitch Supercharge Large Language Model Inference

Nvidia NVLink and Nvidia NVSwitch Supercharge Large Language Model Inference

NVIDIA's NVLink and NVSwitch technologies enhance multi-GPU performance for large language model inference, enabling efficient communication and real-time processing, while future innovations aim to improve bandwidth and scalability.

Link Icon 8 comments
By @zackangelo - 5 months
L40S has 48GB of RAM, curious how they're able to run Llama 3.1 70B on it. The weights alone would exceed this. Maybe they mean quantized/fp8?

I just had to implement GPU clustering in my inference stack to support Llama 3.1 70b, and even then I needed 2xA100 80GB SXMs.

I was initially running my inference servers on fly.io because they were so easy to get started with. But I eventually moved elsewhere because the prices were so high. I pointed out to someone there that e-mailed me that it was really expensive vs. others and they basically just waved me away.

For reference, you can get an A100 SXM 80GB spot instance on google cloud right now for $2.04/hr ($5.07 regular).

By @nknealk - 5 months
> You can run DOOM Eternal, building the Stadia that Google couldn’t pull off, because the L40S hasn’t forgotten that it’s a graphics GPU.

Savage.

I wonder if we’ll see a resurgence of cloud game streaming

By @deepsquirrelnet - 5 months
I hadn’t even heard of L40S until I started renting to get more memory for small training jobs. I didn’t benchmark it, but it seemed to be pretty fast for a pcie card.

Amazon’s g6 instances are L4-based with 24gb vram, half the capacity of the L40S, with sagemaker in demand prices at this rate. Vast ai is cheaper, though a little more like bidding and varying in availability.

By @CGamesPlay - 5 months
> You can run Llama 3.1 70B — the big Llama — for LLM jobs.

That's the medium Llama. Does anyone know if an L40S would run the 405B version?

By @tazu - 5 months
Prices lowered to $1.25/hr... still 2X vast.ai prices.
By @layoric - 5 months
Not as fast as the L40S, but Runpod.io has the A40 48gb for $0.28/hr spot price, so if its mainly VRAM you need, this is a lot cheaper option. Vast.ai has it for the same price as well.
By @blindriver - 5 months
Suddenly cutting prices in half shows that the business model is in dire straits.
By @gedw99 - 5 months
they buy them at 12 K, so they pay them off in 1 year approx

nice business to be in I guess.