October 16th, 2024

Elon Musk set up 100k Nvidia H200 GPUs in 19 days; normally takes 4 years

Elon Musk's xAI team established a supercluster of 100,000 Nvidia H200 GPUs in 19 days, a process Nvidia CEO Jensen Huang noted typically requires four years, calling it unprecedented.

Read original article

Elon Musk set up 100k Nvidia H200 GPUs in 19 days; normally takes 4 years

Elon Musk and the xAI team have successfully set up a supercluster of 100,000 Nvidia H200 GPUs in just 19 days, a feat that Nvidia CEO Jensen Huang claims typically takes around four years to accomplish. Huang expressed admiration for Musk's ability to coordinate the complex installation process, which included building a new factory equipped with liquid cooling and power systems necessary for the GPUs. The rapid deployment involved transitioning from the concept phase to operational status, including the first AI training run on the new supercluster. Huang noted that the extensive planning and logistics usually required for such a project would take an average data center three years, followed by an additional year for installation and testing. The achievement is considered unprecedented in the industry, with Huang stating that it is unlikely to be replicated soon.

- Elon Musk's xAI team set up 100,000 Nvidia H200 GPUs in 19 days.

- Nvidia CEO Jensen Huang highlighted that this process usually takes four years.

- The project involved building a new factory and complex networking for the GPUs.

- The rapid deployment included the first AI training run on the supercluster.

- Huang described the achievement as unprecedented and unlikely to be duplicated soon.

XAI's Memphis Supercluster has gone live, with up to 100,000 Nvidia H100 GPUs

Elon Musk launches xAI's Memphis Supercluster with 100,000 Nvidia H100 GPUs for AI training, aiming for advancements by December. Online status unclear, SemiAnalysis estimates 32,000 GPUs operational. Plans for 150MW data center expansion pending utility agreements. xAI partners with Dell and Supermicro, targeting full operation by fall 2025. Musk's humorous launch time noted.

Four co's are hoarding billions worth of Nvidia GPU chips. Meta has 350K of them

Meta has launched Llama 3.1, a large language model outperforming ChatGPT 4o on some benchmarks. The model's development involved significant investment in Nvidia GPUs, reflecting high demand for AI training resources.

So Who Is Building That 100k GPU Cluster for XAI?

Elon Musk's xAI seeks to build a 100,000 GPU cluster for AI projects, requiring significant Nvidia GPUs. A "Gigafactory of Compute" is planned in Memphis, aiming for completion by late 2025.

Elon Musk and Larry Ellison Begged Nvidia CEO Jensen Huang for AI GPUs at Dinner

Larry Ellison and Elon Musk met Nvidia's CEO to request more AI GPUs. Oracle plans a Zettascale supercluster with 131,072 GPUs and aims to secure power with nuclear reactors.

xAI's 100k GPUs data center in Memphis is up and running

Elon Musk's xAI data center in Memphis has activated 100,000 Nvidia chips, enabling powerful AI model training for the Grok chatbot, amid energy supply challenges and industry skepticism about feasibility.

5 comments

By @jprd - 7 months

So, is the celebration around how Musk was able to use his money to NDA local officials and bypass normal procedures while consuming vast resources from the community?

I know as a community we aren't actually thinking Musk somehow had any technical input or assistance in this endeavor beyond encouraging the shady or strong-arm biz practices that skirt regulations and labor laws, right?

By @blackeyeblitzar - 7 months

Is this a misleading claim? Wouldn’t it take a lot more time to acquire racks, cables, other computer components, install them all, supply power, cooling, etc? Or is it more like they got all that lined up ahead of time and then just did the GPU portions of the overall project in a short amount of time?

By @sidcool - 7 months

This is impressive. I understand the Musk hate. But credit to the X engineering team. Regardless of Musk.

By @nojvek - 7 months

All the Tesla shareholder. First they approve the gazillion dollar payday for Musk, then Musk takes a huge infra investment and gives it to xAI on Tesla dime.

XAI's Memphis Supercluster has gone live, with up to 100,000 Nvidia H100 GPUs

Four co's are hoarding billions worth of Nvidia GPU chips. Meta has 350K of them

So Who Is Building That 100k GPU Cluster for XAI?

Elon Musk's xAI seeks to build a 100,000 GPU cluster for AI projects, requiring significant Nvidia GPUs. A "Gigafactory of Compute" is planned in Memphis, aiming for completion by late 2025.

Elon Musk and Larry Ellison Begged Nvidia CEO Jensen Huang for AI GPUs at Dinner

Larry Ellison and Elon Musk met Nvidia's CEO to request more AI GPUs. Oracle plans a Zettascale supercluster with 131,072 GPUs and aims to secure power with nuclear reactors.

Elon Musk set up 100k Nvidia H200 GPUs in 19 days; normally takes 4 years

Related

XAI's Memphis Supercluster has gone live, with up to 100,000 Nvidia H100 GPUs

Four co's are hoarding billions worth of Nvidia GPU chips. Meta has 350K of them

So Who Is Building That 100k GPU Cluster for XAI?

Elon Musk and Larry Ellison Begged Nvidia CEO Jensen Huang for AI GPUs at Dinner

xAI's 100k GPUs data center in Memphis is up and running

Related

XAI's Memphis Supercluster has gone live, with up to 100,000 Nvidia H100 GPUs

Four co's are hoarding billions worth of Nvidia GPU chips. Meta has 350K of them

So Who Is Building That 100k GPU Cluster for XAI?

Elon Musk and Larry Ellison Begged Nvidia CEO Jensen Huang for AI GPUs at Dinner

xAI's 100k GPUs data center in Memphis is up and running