DeepSeek's AI breakthrough bypasses industry-standard CUDA, uses PTX
DeepSeek trained a 671 billion parameter AI model using 2,048 Nvidia GPUs, achieving tenfold efficiency over competitors. This raised Nvidia's stock concerns but may democratize AI technology access.
Read original articleDeepSeek has made significant advancements in the AI sector by training its Mixture-of-Experts (MoE) language model, which boasts 671 billion parameters, using a cluster of 2,048 Nvidia H800 GPUs. This process took approximately two months and achieved a tenfold increase in efficiency compared to industry leaders like Meta. The key to this breakthrough lies in DeepSeek's use of Nvidia's assembly-like PTX (Parallel Thread Execution) programming, which allows for fine-grained optimizations that standard CUDA programming cannot provide. These optimizations include advanced pipeline algorithms and specific configurations of the GPUs to enhance performance. The success of DeepSeek has raised concerns among investors, leading to a significant drop in Nvidia's stock value, as some believe that the demand for high-performance hardware may decrease. Industry experts, including former Intel CEO Pat Gelsinger, suggest that DeepSeek's innovations could democratize AI technology, making it accessible to a wider range of devices. However, the financial investment required for such developments remains unclear.
- DeepSeek's AI model trained with 671 billion parameters using 2,048 Nvidia H800 GPUs.
- Achieved 10X efficiency compared to competitors like Meta.
- Utilized Nvidia's PTX programming for advanced optimizations.
- Nvidia's stock dropped significantly following DeepSeek's announcement.
- Potential for broader AI applications in less expensive devices.
Related
DeepSeek's new AI model appears to be one of the best 'open' challengers yet
DeepSeek, a Chinese AI firm, launched DeepSeek V3, an open-source model with 671 billion parameters, excelling in text tasks and outperforming competitors, though limited by regulatory constraints.
DeepSeek and the Effects of GPU Export Controls
DeepSeek launched its V3 model, trained on 2,048 H800 GPUs for $5.5 million, emphasizing efficiency and innovation due to U.S. export controls, while exploring advancements beyond transformer architectures.
Nvidia Stock May Fall as DeepSeek's 'Amazing' AI Model Disrupts OpenAI
Nvidia's stock may decline as DeepSeek's R1 AI model, launched in January 2025, offers similar performance to OpenAI's at lower costs, attracting enterprise interest and increasing competition in the AI market.
Nvidia falls 14% in premarket trading as China’s DeepSeek triggers tech sell-off
Nvidia's stock fell 16% after Chinese startup DeepSeek launched a competitive AI model at low cost, raising concerns about U.S. tech firms' competitiveness and prompting broader market sell-offs.
Nvidia calls China's DeepSeek R1 model 'an excellent AI advancement'
Nvidia praised DeepSeek's R1 model as a significant AI advancement, despite a stock drop, noting its cost-effectiveness and potential to increase GPU demand, while raising questions about large tech investments.
Well, it was quite a reverse culture shock after I moved to the US. I definitely didn't know that "teacher's pet" was a thing, or my coworker, a brilliant engineer who went to a highly reputed public school, was chased off his school bus simply because he used some poetic words, or geeks were not that respected in schools, or a mile wide and an inch deep with great leadership is what the US people revered. In the meantime, I guess other countries more or less picked up the baton of the US culture, and grew their own geeks.
CUDA is not an industry standard. Vulcan is an industry standard. They did not bypass CUDA... that's like saying if I use Vulcan I'm bypassing OpenGL. PTX is an alternative low level API provided by Nvidia because of how awful CUDA is for high performance code.
What DeepSeek wrote could only have either been written in PTX or Vulcan.
Any other company could have done this, and low latency traders on Wall Street that use Nvidia write their stuff in PTX for obvious reasons.
OpenAI, was, is, and always will be, absolutely incompetent when it comes to using their hardware effectively... and they're no different than any other company. Reading is not a goddamned super power! Just read the docs!
> Specifically, we employ customized PTX (Parallel Thread Execution) instructions and auto-tune the communication chunk size, which significantly reduces the use of the L2 cache and the interference to other SMs.
So they have some intrinsic in some part of their training framework. That's it.
IIRC this is still relatively hardware agnostic. Can you actually get very far by doing this ? From a quick perusal, DeepSeek also uses Triton in the codebase.
Isn't CUDA an Nvidia child ? This sounds like "Microsoft = industry standard".
This gives them a few months head start before meta and Google start doing the same thing.
Related
DeepSeek's new AI model appears to be one of the best 'open' challengers yet
DeepSeek, a Chinese AI firm, launched DeepSeek V3, an open-source model with 671 billion parameters, excelling in text tasks and outperforming competitors, though limited by regulatory constraints.
DeepSeek and the Effects of GPU Export Controls
DeepSeek launched its V3 model, trained on 2,048 H800 GPUs for $5.5 million, emphasizing efficiency and innovation due to U.S. export controls, while exploring advancements beyond transformer architectures.
Nvidia Stock May Fall as DeepSeek's 'Amazing' AI Model Disrupts OpenAI
Nvidia's stock may decline as DeepSeek's R1 AI model, launched in January 2025, offers similar performance to OpenAI's at lower costs, attracting enterprise interest and increasing competition in the AI market.
Nvidia falls 14% in premarket trading as China’s DeepSeek triggers tech sell-off
Nvidia's stock fell 16% after Chinese startup DeepSeek launched a competitive AI model at low cost, raising concerns about U.S. tech firms' competitiveness and prompting broader market sell-offs.
Nvidia calls China's DeepSeek R1 model 'an excellent AI advancement'
Nvidia praised DeepSeek's R1 model as a significant AI advancement, despite a stock drop, noting its cost-effectiveness and potential to increase GPU demand, while raising questions about large tech investments.