January 23rd, 2025

DeepSeek and the Effects of GPU Export Controls

DeepSeek launched its V3 model, trained on 2,048 H800 GPUs for $5.5 million, emphasizing efficiency and innovation due to U.S. export controls, while exploring advancements beyond transformer architectures.

Read original article

DeepSeek and the Effects of GPU Export Controls

DeepSeek recently launched its V3 model, which was trained using 2,048 H800 GPUs at a cost of $5.5 million, significantly less than the estimated $40 million spent on OpenAI's GPT-4. Despite using fewer resources, DeepSeek claims their model meets or exceeds benchmarks set by leading AI models. The U.S. export controls on high-end GPUs to China forced DeepSeek to innovate with the limited H800s, leading to a focus on architectural efficiency rather than sheer computational power. This included advancements in FP8 precision training and novel training frameworks. DeepSeek is backed by High-Flyer, a quant fund, and its CEO emphasizes a long-term vision for artificial general intelligence (AGI) rather than immediate profits. While DeepSeek's achievements are notable, experts caution against overinterpreting the results, suggesting that the path to improved AI may not solely rely on increased hardware resources. The company is also exploring ways to overcome the limitations of transformer architectures, indicating potential future advancements in AI training methodologies.

- DeepSeek's V3 model was trained on 2,048 H800 GPUs at a fraction of the cost of leading models.

- U.S. export controls on GPUs led to innovative training methods focused on efficiency.

- DeepSeek is backed by High-Flyer, emphasizing foundational research over quick profits.

- The results suggest that significant AI advancements can be made without extensive hardware resources.

- Future developments may focus on overcoming transformer architecture limitations.

DeepSeek-V3, ultra-large open-source AI, outperforms Llama and Qwen on launch

Chinese AI startup DeepSeek launched DeepSeek-V3, a 671 billion parameter model outperforming major competitors. It features cost-effective training, innovative architecture, and is available for testing and commercial use.

DeepSeek's new AI model appears to be one of the best 'open' challengers yet

DeepSeek, a Chinese AI firm, launched DeepSeek V3, an open-source model with 671 billion parameters, excelling in text tasks and outperforming competitors, though limited by regulatory constraints.

Interesting Interview with DeepSeek's CEO

Deepseek, a Chinese AI startup, has surpassed OpenAI's models in reasoning benchmarks, focusing on foundational AI technology, open-source models, and low-cost APIs, while aiming for artificial general intelligence.

DeepSeek v3: The Six Million Dollar Model

DeepSeek v3 is an affordable AI model with 37 billion active parameters, showing competitive benchmarks but underperforming in output diversity and coherence. Its real-world effectiveness remains to be evaluated.

Notes on the New Deepseek v3

Deepseek v3, a leading open-source model with 607 billion parameters, excels in reasoning and math tasks, outperforming competitors while being cost-effective, trained on 14.8 trillion data points for $6 million.

12 comments

By @ioulaum - 4 months

The Chinese do have their home grown GPUs too, although I have the impression that they're not super good.

Even so, if we look at Groq / Cerebras, the fastest LLM inference companies:

They're both based on architectures that are 7nm+, and so architectures that China can produce locally despite the export restrictions.

Ultimately, the export controls are mainly just inconvenience. Not a real blocker.

The Chinese don't need to achieve state of the art chip manufacturing to achieve SOTA AI outcomes.

They just need to make custom silicon specialized for the kinds of AI algorithms they want to scale.

Of course, at scale, that's going to mean that the US should eventually have both lower production costs, and energy use in consumer use of AI models, and that Chinese products will likely be more dependent on the cloud for at least the near future.

The whole strategy seems ultimately meh in a long term sense... Mainly good for building up a sense of mutual enmity and dividing the world ... Which is also going to result in higher cost of living around the world as trade falters.

Sad stuff.

By @o999 - 4 months

It is important to keep in mind that GPUs power per $ is what matters, and not per unit.

China can produce much cheaper electronics that can compete even when they aren't as powerful as NVIDIA's

By @sanjams - 4 months

> Infrastructure algorithm optimization

> Novel training frameworks

Where can one find more information about these? I keep seeing hand-wavy language like this w.r.t. DeepSeek’s innovation

By @murtio - 4 months

> DeepSeek isn't a typical startup - they're backed by High-Flyer, an $8B quant fund. Their CEO Liang Wenfeng built High-Flyer from scratch and seems focused on foundational research over quick profits

How is that useful?

By @whywhywhywhy - 4 months

Excellent models that need a fraction of compute were obviously going to come from this. OAI is actually encouraged to not to try to make their models because compute is a moat too.

By @Cumpiler69 - 4 months

Question: What's stopping China from buying GPUs via third party middle-men countries that don't have export controls to China?

I would assume nothing, similarly to how exports of western tech from western countries somehow magically exploded overnight to Russia's neighbors and everyone is pretending not to notice because it makes money.

https://i.imgur.com/kDCsxbt.jpeg

By @chvid - 4 months

DeepSeek shows that it is not the size of your computer that matters the most, rather your talent, and the approach you are taking.

Should have been obvious but now somehow isn't?

By @hendersoon - 4 months

With $8B in the bank I have some degree of confidence Deepseek evaded the export controls and used full-fat GPUs in addition to the H800s.

By @sinuhe69 - 4 months

There is also rumor that they in fact have access to 50000 H100 GPU, and not just H800. 50000 H100 is as big as half of Elon Musk's Colossus!

By @Nyr - 4 months

This article is assuming that they are being truthful and indeed had access to limited hardware resources, which is doubtful to say the least.

By @sschueller - 4 months

I still don't understand the insane investments in LLM with the believe that it will get us to AGI when that is not possible with LLM. The limitation isn't compute or model size, it's the core concept of LLM.

DeepSeek-V3, ultra-large open-source AI, outperforms Llama and Qwen on launch

DeepSeek's new AI model appears to be one of the best 'open' challengers yet

DeepSeek, a Chinese AI firm, launched DeepSeek V3, an open-source model with 671 billion parameters, excelling in text tasks and outperforming competitors, though limited by regulatory constraints.

DeepSeek and the Effects of GPU Export Controls

Related

DeepSeek-V3, ultra-large open-source AI, outperforms Llama and Qwen on launch

DeepSeek's new AI model appears to be one of the best 'open' challengers yet

Interesting Interview with DeepSeek's CEO

DeepSeek v3: The Six Million Dollar Model

Notes on the New Deepseek v3

Related

DeepSeek-V3, ultra-large open-source AI, outperforms Llama and Qwen on launch

DeepSeek's new AI model appears to be one of the best 'open' challengers yet

Interesting Interview with DeepSeek's CEO

DeepSeek v3: The Six Million Dollar Model

Notes on the New Deepseek v3