January 23rd, 2025

DeepSeek and the Effects of GPU Export Controls

DeepSeek launched its V3 model, trained on 2,048 H800 GPUs for $5.5 million, emphasizing efficiency and innovation due to U.S. export controls, while exploring advancements beyond transformer architectures.

Read original articleLink Icon
DeepSeek and the Effects of GPU Export Controls

DeepSeek recently launched its V3 model, which was trained using 2,048 H800 GPUs at a cost of $5.5 million, significantly less than the estimated $40 million spent on OpenAI's GPT-4. Despite using fewer resources, DeepSeek claims their model meets or exceeds benchmarks set by leading AI models. The U.S. export controls on high-end GPUs to China forced DeepSeek to innovate with the limited H800s, leading to a focus on architectural efficiency rather than sheer computational power. This included advancements in FP8 precision training and novel training frameworks. DeepSeek is backed by High-Flyer, a quant fund, and its CEO emphasizes a long-term vision for artificial general intelligence (AGI) rather than immediate profits. While DeepSeek's achievements are notable, experts caution against overinterpreting the results, suggesting that the path to improved AI may not solely rely on increased hardware resources. The company is also exploring ways to overcome the limitations of transformer architectures, indicating potential future advancements in AI training methodologies.

- DeepSeek's V3 model was trained on 2,048 H800 GPUs at a fraction of the cost of leading models.

- U.S. export controls on GPUs led to innovative training methods focused on efficiency.

- DeepSeek is backed by High-Flyer, emphasizing foundational research over quick profits.

- The results suggest that significant AI advancements can be made without extensive hardware resources.

- Future developments may focus on overcoming transformer architecture limitations.

Link Icon 12 comments
By @ioulaum - about 1 month
The Chinese do have their home grown GPUs too, although I have the impression that they're not super good.

Even so, if we look at Groq / Cerebras, the fastest LLM inference companies:

They're both based on architectures that are 7nm+, and so architectures that China can produce locally despite the export restrictions.

Ultimately, the export controls are mainly just inconvenience. Not a real blocker.

The Chinese don't need to achieve state of the art chip manufacturing to achieve SOTA AI outcomes.

They just need to make custom silicon specialized for the kinds of AI algorithms they want to scale.

Of course, at scale, that's going to mean that the US should eventually have both lower production costs, and energy use in consumer use of AI models, and that Chinese products will likely be more dependent on the cloud for at least the near future.

The whole strategy seems ultimately meh in a long term sense... Mainly good for building up a sense of mutual enmity and dividing the world ... Which is also going to result in higher cost of living around the world as trade falters.

Sad stuff.

By @o999 - about 1 month
It is important to keep in mind that GPUs power per $ is what matters, and not per unit.

China can produce much cheaper electronics that can compete even when they aren't as powerful as NVIDIA's

By @sanjams - about 1 month
> Infrastructure algorithm optimization

> Novel training frameworks

Where can one find more information about these? I keep seeing hand-wavy language like this w.r.t. DeepSeek’s innovation

By @murtio - about 1 month
> DeepSeek isn't a typical startup - they're backed by High-Flyer, an $8B quant fund. Their CEO Liang Wenfeng built High-Flyer from scratch and seems focused on foundational research over quick profits

How is that useful?

By @whywhywhywhy - about 1 month
Excellent models that need a fraction of compute were obviously going to come from this. OAI is actually encouraged to not to try to make their models because compute is a moat too.
By @Cumpiler69 - about 1 month
Question: What's stopping China from buying GPUs via third party middle-men countries that don't have export controls to China?

I would assume nothing, similarly to how exports of western tech from western countries somehow magically exploded overnight to Russia's neighbors and everyone is pretending not to notice because it makes money.

https://i.imgur.com/kDCsxbt.jpeg

By @chvid - about 1 month
DeepSeek shows that it is not the size of your computer that matters the most, rather your talent, and the approach you are taking.

Should have been obvious but now somehow isn't?

By @hendersoon - about 1 month
With $8B in the bank I have some degree of confidence Deepseek evaded the export controls and used full-fat GPUs in addition to the H800s.
By @sinuhe69 - about 1 month
There is also rumor that they in fact have access to 50000 H100 GPU, and not just H800. 50000 H100 is as big as half of Elon Musk's Colossus!
By @Nyr - about 1 month
This article is assuming that they are being truthful and indeed had access to limited hardware resources, which is doubtful to say the least.
By @sschueller - about 1 month
I still don't understand the insane investments in LLM with the believe that it will get us to AGI when that is not possible with LLM. The limitation isn't compute or model size, it's the core concept of LLM.