We're Cutting L40S Prices in Half
Fly.io has reduced L40S GPU prices to $1.25 per hour, targeting developers for AI workloads. The L40S offers A100-like performance, focusing on inference tasks and integrating with fast networking and storage.
Read original articleFly.io has announced a significant price reduction for its NVIDIA L40S GPUs, now available at $1.25 per hour. This move aims to enhance accessibility for developers seeking GPU-accelerated AI workloads. The company offers a range of NVIDIA GPUs, with the A10 being the most popular despite its older technology, as it meets the needs for random inference tasks effectively. The L40S, an AI-optimized version of the L40, is designed for data center use and offers performance comparable to the A100, making it a cost-effective option for users. Fly.io's strategy reflects a shift in user demand, with a focus on inference rather than training workloads, which require different performance characteristics. The L40S is positioned as a versatile solution for various applications, including large language models, generative AI, and even gaming. Fly.io emphasizes the importance of combining GPU power with fast networking and storage solutions to optimize performance for real-time applications. The company encourages users to take advantage of the new pricing and capabilities of the L40S to innovate in their projects.
- Fly.io has reduced the price of L40S GPUs to $1.25 per hour.
- The A10 GPU remains the most popular among users for its efficiency in inference tasks.
- The L40S offers performance comparable to the A100, targeting cost-sensitive developers.
- The shift in demand is towards inference workloads rather than training jobs.
- Fly.io promotes the integration of GPU power with fast networking and storage for optimal performance.
Related
Intel's Gaudi 3 will cost half the price of Nvidia's H100
Intel's Gaudi 3 AI processor is priced at $15,650, half of Nvidia's H100. Intel aims to compete in the AI market dominated by Nvidia, facing challenges from cloud providers' custom AI processors.
Fly.io initiates Region-specific Machines pricing
Fly.io is changing pricing for Machines service to region-specific rates over four months, starting in August and settling in November. Users will see per region charges on invoices, with no immediate changes in July. Concerns raised about price hikes, acknowledged display issues, and ongoing talks about commitment discounts.
Show HN: We made glhf.chat – run almost any open-source LLM, including 405B
The platform allows running various large language models via Hugging Face repo links using vLLM and GPU scheduler. Offers free beta access with plans for competitive pricing post-beta using multi-tenant model running.
Four co's are hoarding billions worth of Nvidia GPU chips. Meta has 350K of them
Meta has launched Llama 3.1, a large language model outperforming ChatGPT 4o on some benchmarks. The model's development involved significant investment in Nvidia GPUs, reflecting high demand for AI training resources.
Nvidia NVLink and Nvidia NVSwitch Supercharge Large Language Model Inference
NVIDIA's NVLink and NVSwitch technologies enhance multi-GPU performance for large language model inference, enabling efficient communication and real-time processing, while future innovations aim to improve bandwidth and scalability.
I just had to implement GPU clustering in my inference stack to support Llama 3.1 70b, and even then I needed 2xA100 80GB SXMs.
I was initially running my inference servers on fly.io because they were so easy to get started with. But I eventually moved elsewhere because the prices were so high. I pointed out to someone there that e-mailed me that it was really expensive vs. others and they basically just waved me away.
For reference, you can get an A100 SXM 80GB spot instance on google cloud right now for $2.04/hr ($5.07 regular).
Savage.
I wonder if we’ll see a resurgence of cloud game streaming
Amazon’s g6 instances are L4-based with 24gb vram, half the capacity of the L40S, with sagemaker in demand prices at this rate. Vast ai is cheaper, though a little more like bidding and varying in availability.
That's the medium Llama. Does anyone know if an L40S would run the 405B version?
nice business to be in I guess.
Related
Intel's Gaudi 3 will cost half the price of Nvidia's H100
Intel's Gaudi 3 AI processor is priced at $15,650, half of Nvidia's H100. Intel aims to compete in the AI market dominated by Nvidia, facing challenges from cloud providers' custom AI processors.
Fly.io initiates Region-specific Machines pricing
Fly.io is changing pricing for Machines service to region-specific rates over four months, starting in August and settling in November. Users will see per region charges on invoices, with no immediate changes in July. Concerns raised about price hikes, acknowledged display issues, and ongoing talks about commitment discounts.
Show HN: We made glhf.chat – run almost any open-source LLM, including 405B
The platform allows running various large language models via Hugging Face repo links using vLLM and GPU scheduler. Offers free beta access with plans for competitive pricing post-beta using multi-tenant model running.
Four co's are hoarding billions worth of Nvidia GPU chips. Meta has 350K of them
Meta has launched Llama 3.1, a large language model outperforming ChatGPT 4o on some benchmarks. The model's development involved significant investment in Nvidia GPUs, reflecting high demand for AI training resources.
Nvidia NVLink and Nvidia NVSwitch Supercharge Large Language Model Inference
NVIDIA's NVLink and NVSwitch technologies enhance multi-GPU performance for large language model inference, enabling efficient communication and real-time processing, while future innovations aim to improve bandwidth and scalability.