August 28th, 2024

Cerebras Launches the Fastest AI Inference

Cerebras Systems launched Cerebras Inference, the fastest AI inference solution, outperforming NVIDIA GPUs by 20 times, processing up to 1,800 tokens per second, with significant cost advantages and multiple service tiers.

Read original articleLink Icon
Cerebras Launches the Fastest AI Inference

Cerebras Systems has launched Cerebras Inference, touted as the fastest AI inference solution globally, achieving 20 times the performance of NVIDIA GPU-based systems. The new service can process 1,800 tokens per second for the Llama3.1 8B model and 450 tokens per second for the Llama3.1 70B model, while maintaining high accuracy in the 16-bit domain. Priced at 10 cents per million tokens for the 8B model and 60 cents for the 70B model, Cerebras Inference offers a significant cost advantage over traditional GPU solutions, providing 100 times better price-performance for AI workloads. The service is available in three tiers: Free, Developer, and Enterprise, catering to various user needs. The underlying technology is powered by the Cerebras CS-3 system and the Wafer Scale Engine 3 (WSE-3), which boasts 7,000 times more memory bandwidth than the Nvidia H100. This launch is expected to revolutionize AI applications, particularly those requiring real-time processing, and has garnered positive feedback from industry leaders. Cerebras aims to support a wide range of AI models in the future, enhancing its appeal to developers and enterprises alike.

- Cerebras Inference is 20 times faster than NVIDIA GPU solutions.

- It processes up to 1,800 tokens per second for Llama3.1 8B and 450 tokens for Llama3.1 70B.

- The pricing model offers significant cost savings compared to traditional GPU options.

- The service is available in Free, Developer, and Enterprise tiers.

- Powered by the Wafer Scale Engine 3, it provides exceptional memory bandwidth and performance.

Link Icon 2 comments
By @Aeolun - 6 months
> We’re thrilled by the amount of developer interest in our instant inference API and will be onboarding more developers every day. Join our waitlist now to secure your spot!

Very, very uncool to place this after the sign-in.

By @ChrisArchitect - 6 months
[dupe] More discussion on blog post: https://news.ycombinator.com/item?id=41369705