July 28th, 2024

Four co's are hoarding billions worth of Nvidia GPU chips. Meta has 350K of them

Meta has launched Llama 3.1, a large language model outperforming ChatGPT 4o on some benchmarks. The model's development involved significant investment in Nvidia GPUs, reflecting high demand for AI training resources.

Read original article

Four co's are hoarding billions worth of Nvidia GPU chips. Meta has 350K of them

Meta has released Llama 3.1, a new large language model that reportedly outperforms OpenAI's ChatGPT 4o on certain benchmarks. The model, particularly its largest version with 405 billion parameters, was trained using up to 16,000 Nvidia H100 GPUs, which are valued between $20,000 and $40,000 each. This indicates that Meta has invested up to $640 million in hardware for this model alone, part of a larger goal to amass 350,000 H100s, totaling over $10 billion in Nvidia chips. Other companies, including venture capital firm Andreessen Horowitz and Tesla, are also stockpiling H100s for AI training. Andreessen Horowitz reportedly has over 20,000 GPUs, while Tesla aims for 35,000 to 85,000 H100s, with Elon Musk stating that xAI's training cluster consists of 100,000 H100s. The demand for these GPUs is so high that there are reports of individuals being paid to smuggle them into China to evade U.S. export controls. OpenAI's GPU strategy remains less transparent, but it is known to rent significant processing power from Microsoft and Oracle. Meanwhile, the California Supreme Court upheld Proposition 22, allowing gig companies like Uber and Lyft to classify drivers as independent contractors, which has implications for worker protections and company costs. This ruling has been met with mixed reactions, as it maintains the current business model for these companies.

AI models that cost $1B to train are underway, $100B models coming

AI training costs are rising exponentially, with models now reaching $1 billion to train. Companies are developing more powerful hardware to meet demand, but concerns about societal impact persist.

OpenAI slashes the cost of using its AI with a "mini" model

OpenAI launches GPT-4o mini, a cheaper model enhancing AI accessibility. Meta to release Llama 3. Market sees a mix of small and large models for cost-effective AI solutions.

XAI's Memphis Supercluster has gone live, with up to 100,000 Nvidia H100 GPUs

Elon Musk launches xAI's Memphis Supercluster with 100,000 Nvidia H100 GPUs for AI training, aiming for advancements by December. Online status unclear, SemiAnalysis estimates 32,000 GPUs operational. Plans for 150MW data center expansion pending utility agreements. xAI partners with Dell and Supermicro, targeting full operation by fall 2025. Musk's humorous launch time noted.

Meta releases an open-weights GPT-4-level AI model, Llama 3.1 405B

Meta has launched Llama 3.1 405B, a free AI language model with 405 billion parameters, challenging closed AI models. Users can download it for personal use, promoting open-source AI principles. Mark Zuckerberg endorses this move.

Big tech wants to make AI cost nothing

Meta has open-sourced its Llama 3.1 language model for organizations with fewer than 700 million users, aiming to enhance its public image and increase product demand amid rising AI infrastructure costs.

6 comments

By @SloopJon - 9 months

To me, "hoard" comes with an implication of amassing these in secret and/or not making full use of them. If Meta is crowing about how many they have, and seemingly putting them to good use (or at least some use), I wouldn't call that hoarding.

I occasionally acquire half-decent hardware for stress testing. If I add a pair of Dell servers to the lab (or, heaven help me, anything from IBM), I don't feel like I'm hoarding. What I do feel, after spending $50-500K of someone else's money, is a responsibility to get their money's worth out of it.

I can understand, then, Tesla shareholder frustration with 12,000 of these being diverted to a different neighborhood in Muskville. Talk about other people's money.

By @jsheard - 9 months

That's an estimated $7-14 billion worth of H100s in Metas racks and they're using them to make models which they give away for free, with no clear plan for how that's ever going to be worth their while. I'm getting flashbacks to Meta acquiring Oculus a decade ago and still having absolutely no idea how to even make it break even, they just keep dumping endless billions into it in the hopes that it will eventually all be worth it, somehow.

I know people here like to imagine Zucks AI strategy as being 5D Chess, but in light of Oculus it's hard for me to see it as anything but desperately scrambling for a pivot to the next big thing after they hit peak Facebook.

By @bcrl - 9 months

Just think of how much e-waste this will be in a few years once it no longer makes sense to run them due to improved FLOPs per Watt of newer silicon.

By @TMWNN - 9 months

Does Google/Alphabet also use the previous-generation A100 GPUs that the article says Microsoft is running, or has Google shifted fully to its own design?

By @deafpolygon - 9 months

You can hardly hoard things that will become obsolete in a few years.

By @xenospn - 9 months

Have they made any money running these?

AI models that cost $1B to train are underway, $100B models coming

AI training costs are rising exponentially, with models now reaching $1 billion to train. Companies are developing more powerful hardware to meet demand, but concerns about societal impact persist.

OpenAI slashes the cost of using its AI with a "mini" model

OpenAI launches GPT-4o mini, a cheaper model enhancing AI accessibility. Meta to release Llama 3. Market sees a mix of small and large models for cost-effective AI solutions.

Four co's are hoarding billions worth of Nvidia GPU chips. Meta has 350K of them

Related

AI models that cost $1B to train are underway, $100B models coming

OpenAI slashes the cost of using its AI with a "mini" model

XAI's Memphis Supercluster has gone live, with up to 100,000 Nvidia H100 GPUs

Meta releases an open-weights GPT-4-level AI model, Llama 3.1 405B

Big tech wants to make AI cost nothing

Related

AI models that cost $1B to train are underway, $100B models coming

OpenAI slashes the cost of using its AI with a "mini" model

XAI's Memphis Supercluster has gone live, with up to 100,000 Nvidia H100 GPUs

Meta releases an open-weights GPT-4-level AI model, Llama 3.1 405B

Big tech wants to make AI cost nothing