So Who Is Building That 100k GPU Cluster for XAI?
Elon Musk's xAI seeks to build a 100,000 GPU cluster for AI projects, requiring significant Nvidia GPUs. A "Gigafactory of Compute" is planned in Memphis, aiming for completion by late 2025.
Read original articleElon Musk's companies, including xAI, Tesla, and SpaceX, are in need of a significant number of GPUs for their AI and high-performance computing projects. Musk, who co-founded OpenAI in 2015, established xAI in March 2023 after leaving OpenAI amid a power struggle. xAI has raised $6.4 billion in funding to compete with major players like OpenAI, Google, and Amazon. Musk aims to build a 100,000 GPU cluster for xAI, with the upcoming Grok-2 model requiring 24,000 Nvidia H100 GPUs for training. The Grok-3 model, expected by the end of the year, will also need a similar GPU count.
After a deal for GPU capacity with Oracle fell through, Musk decided to create a "Gigafactory of Compute" in Memphis, Tennessee, to house the GPU cluster. The factory currently has 8 megawatts of power allocated, with plans to increase this to 50 megawatts. The full 100,000 GPU capacity may not be achieved until late 2025. Supermicro is expected to provide the water-cooled machines for the cluster, while Nvidia is likely supplying the networking infrastructure through its Spectrum-X technology. The storage solutions for the cluster remain unspecified, but companies like Vast Data may be involved. Overall, the ambitious project reflects Musk's intent to establish xAI as a formidable competitor in the AI landscape.
Related
XAI's Memphis Supercluster has gone live, with up to 100,000 Nvidia H100 GPUs
Elon Musk launches xAI's Memphis Supercluster with 100,000 Nvidia H100 GPUs for AI training, aiming for advancements by December. Online status unclear, SemiAnalysis estimates 32,000 GPUs operational. Plans for 150MW data center expansion pending utility agreements. xAI partners with Dell and Supermicro, targeting full operation by fall 2025. Musk's humorous launch time noted.
VCs are still pouring billions into generative AI startups
Investments in generative AI startups reached $12.3 billion in H1 2023, focusing on early-stage ventures. Challenges include legal issues and rising costs, making profitability elusive for many companies.
Ex-Twitter dev reminisces about finding 700 unused Nvidia GPUs after takeover
Tim Zaman, a former Twitter engineer, revealed 700 idle Nvidia V100 GPUs in Twitter's data center post-Elon Musk's acquisition, highlighting inefficiencies in resource management amid rising AI demands.
Four co's are hoarding billions worth of Nvidia GPU chips. Meta has 350K of them
Meta has launched Llama 3.1, a large language model outperforming ChatGPT 4o on some benchmarks. The model's development involved significant investment in Nvidia GPUs, reflecting high demand for AI training resources.
YC closes deal with Google for dedicated compute cluster for AI startups
Google Cloud has launched a dedicated Nvidia GPU and TPU cluster for Y Combinator startups, offering $350,000 in cloud credits and support to enhance AI development and innovation.
More impressive than that though is the 1000+ acres of empty land going south that is prepped for future industry. As a Memphian, I hope Elon sees the potential to build huge scale stuff here in the future. There’s a huge steel mill, a river terminal for direct access to oversized shipping, a wastewater plant next door, a large Valero oil refinery up the street, and a Canadian National intermodal facility down the road.
Literally everything one would need to build millions of robots or trucks or rocket engines or whatever.
https://maps.app.goo.gl/NuHUpg6CcndxyKHK9?g_st=com.google.ma...
Specifically, the XE9680 racks ( 8xH100 ). 100k H100 GPU, 300k B200 GPU, liquid cooled doors in the racks, 750 racks, and some other minor details.
It doesn't look like there are any moats to speak of.
If the goal is to create a bigger, faster, smarter version of OpenAI's GPT, then I'm not sure if this sounds even logical. The current crop of LLM models is just a Q&A hallucination filled chatbot which are way past their peak hype cycle. There is still no business model or killer product-market fit. I don't know what the investors are expecting, but they are more than willing to throw more than 6B dollars. So idk. °_°
Related
XAI's Memphis Supercluster has gone live, with up to 100,000 Nvidia H100 GPUs
Elon Musk launches xAI's Memphis Supercluster with 100,000 Nvidia H100 GPUs for AI training, aiming for advancements by December. Online status unclear, SemiAnalysis estimates 32,000 GPUs operational. Plans for 150MW data center expansion pending utility agreements. xAI partners with Dell and Supermicro, targeting full operation by fall 2025. Musk's humorous launch time noted.
VCs are still pouring billions into generative AI startups
Investments in generative AI startups reached $12.3 billion in H1 2023, focusing on early-stage ventures. Challenges include legal issues and rising costs, making profitability elusive for many companies.
Ex-Twitter dev reminisces about finding 700 unused Nvidia GPUs after takeover
Tim Zaman, a former Twitter engineer, revealed 700 idle Nvidia V100 GPUs in Twitter's data center post-Elon Musk's acquisition, highlighting inefficiencies in resource management amid rising AI demands.
Four co's are hoarding billions worth of Nvidia GPU chips. Meta has 350K of them
Meta has launched Llama 3.1, a large language model outperforming ChatGPT 4o on some benchmarks. The model's development involved significant investment in Nvidia GPUs, reflecting high demand for AI training resources.
YC closes deal with Google for dedicated compute cluster for AI startups
Google Cloud has launched a dedicated Nvidia GPU and TPU cluster for Y Combinator startups, offering $350,000 in cloud credits and support to enhance AI development and innovation.