August 28th, 2024

Cerebras Launches the Fastest AI Inference

Cerebras Systems launched Cerebras Inference, the fastest AI inference solution, outperforming NVIDIA GPUs by 20 times, processing up to 1,800 tokens per second, with significant cost advantages and multiple service tiers.

Read original article

Cerebras Launches the Fastest AI Inference

Cerebras Systems has launched Cerebras Inference, touted as the fastest AI inference solution globally, achieving 20 times the performance of NVIDIA GPU-based systems. The new service can process 1,800 tokens per second for the Llama3.1 8B model and 450 tokens per second for the Llama3.1 70B model, while maintaining high accuracy in the 16-bit domain. Priced at 10 cents per million tokens for the 8B model and 60 cents for the 70B model, Cerebras Inference offers a significant cost advantage over traditional GPU solutions, providing 100 times better price-performance for AI workloads. The service is available in three tiers: Free, Developer, and Enterprise, catering to various user needs. The underlying technology is powered by the Cerebras CS-3 system and the Wafer Scale Engine 3 (WSE-3), which boasts 7,000 times more memory bandwidth than the Nvidia H100. This launch is expected to revolutionize AI applications, particularly those requiring real-time processing, and has garnered positive feedback from industry leaders. Cerebras aims to support a wide range of AI models in the future, enhancing its appeal to developers and enterprises alike.

- Cerebras Inference is 20 times faster than NVIDIA GPU solutions.

- It processes up to 1,800 tokens per second for Llama3.1 8B and 450 tokens for Llama3.1 70B.

- The pricing model offers significant cost savings compared to traditional GPU options.

- The service is available in Free, Developer, and Enterprise tiers.

- Powered by the Wafer Scale Engine 3, it provides exceptional memory bandwidth and performance.

Intel's Gaudi 3 will cost half the price of Nvidia's H100

Intel's Gaudi 3 AI processor is priced at $15,650, half of Nvidia's H100. Intel aims to compete in the AI market dominated by Nvidia, facing challenges from cloud providers' custom AI processors.

Tenstorrent Unveils High-End Wormhole AI Processors, Featuring RISC-V

Tenstorrent launches Wormhole AI chips on RISC-V, emphasizing cost-effectiveness and scalability. Wormhole n150 offers 262 TFLOPS, n300 doubles power with 24 GB GDDR6. Priced from $999, undercutting NVIDIA. New workstations from $1,500.

Groq Supercharges Fast AI Inference for Meta Llama 3.1

Groq launches Llama 3.1 models with LPU™ AI technology on GroqCloud Dev Console and GroqChat. Mark Zuckerberg praises ultra-low-latency inference for cloud deployments, emphasizing open-source collaboration and AI innovation.

Cerebras Inference: AI at Instant Speed

Cerebras launched its AI inference solution, claiming to process 1,800 tokens per second, outperforming NVIDIA by 20 times, with competitive pricing and plans for future model support.

Cerebras reaches 1800 tokens/s for 8B Llama3.1

Cerebras Systems is deploying Meta's LLaMA 3.1 model on its wafer-scale chip, achieving faster processing speeds and lower costs, while aiming to simplify developer integration through an API.

2 comments

By @Aeolun - 8 months

> We’re thrilled by the amount of developer interest in our instant inference API and will be onboarding more developers every day. Join our waitlist now to secure your spot!

Very, very uncool to place this after the sign-in.

By @ChrisArchitect - 8 months

[dupe] More discussion on blog post: https://news.ycombinator.com/item?id=41369705

Intel's Gaudi 3 will cost half the price of Nvidia's H100

Intel's Gaudi 3 AI processor is priced at $15,650, half of Nvidia's H100. Intel aims to compete in the AI market dominated by Nvidia, facing challenges from cloud providers' custom AI processors.

Tenstorrent Unveils High-End Wormhole AI Processors, Featuring RISC-V

Groq Supercharges Fast AI Inference for Meta Llama 3.1

Cerebras Inference: AI at Instant Speed

Cerebras launched its AI inference solution, claiming to process 1,800 tokens per second, outperforming NVIDIA by 20 times, with competitive pricing and plans for future model support.

Cerebras reaches 1800 tokens/s for 8B Llama3.1

Cerebras Systems is deploying Meta's LLaMA 3.1 model on its wafer-scale chip, achieving faster processing speeds and lower costs, while aiming to simplify developer integration through an API.

Cerebras Launches the Fastest AI Inference

Related

Intel's Gaudi 3 will cost half the price of Nvidia's H100

Tenstorrent Unveils High-End Wormhole AI Processors, Featuring RISC-V

Groq Supercharges Fast AI Inference for Meta Llama 3.1

Cerebras Inference: AI at Instant Speed

Cerebras reaches 1800 tokens/s for 8B Llama3.1

Related

Intel's Gaudi 3 will cost half the price of Nvidia's H100

Tenstorrent Unveils High-End Wormhole AI Processors, Featuring RISC-V

Groq Supercharges Fast AI Inference for Meta Llama 3.1

Cerebras Inference: AI at Instant Speed

Cerebras reaches 1800 tokens/s for 8B Llama3.1