January 27th, 2025

DeepSeek FAQ

DeepSeek's R1 reasoning model has sparked discussions on AI development and U.S.-China relations, with its V3 model's training cost raising skepticism and potentially shifting the AI landscape for major tech companies.

Read original articleLink Icon
DeepSeek FAQ

DeepSeek has recently made headlines with its R1 reasoning model, which has sparked significant discussion regarding its implications for AI development, particularly in the context of U.S.-China relations. The model builds on previous innovations introduced in DeepSeek's V2 and V3 models, which featured breakthroughs like DeepSeekMoE (mixture of experts) and DeepSeekMLA (multi-head latent attention). These advancements allow for more efficient training and inference, significantly reducing costs. DeepSeek claims that training its V3 model cost only $5.576 million, a figure that has raised skepticism among industry experts. The model is said to be competitive with leading models from OpenAI and Anthropic, and it is suggested that DeepSeek may have utilized distillation techniques to enhance its training data. The broader implications of DeepSeek's advancements could lead to a shift in the AI landscape, affecting major tech companies like Microsoft, Amazon, and Apple, as they adapt to cheaper inference costs and the potential commoditization of AI models. This situation may also influence the dynamics of partnerships and investments in AI technology, particularly as companies weigh the costs of developing cutting-edge models against the benefits of leveraging existing technologies.

- DeepSeek's R1 model has generated significant discussion about its implications for AI and U.S.-China relations.

- The V3 model's training cost of $5.576 million has raised skepticism in the industry.

- DeepSeek's innovations may lead to a shift in the AI landscape, impacting major tech companies.

- Distillation techniques may have been used to enhance DeepSeek's training data.

- The commoditization of AI models could influence partnerships and investments in the tech sector.

Link Icon 7 comments
By @JumpCrisscross - 27 days
“DeepSeek gave the model a set of math, code, and logic questions, and set two reward functions: one for the right answer, and one for the right format that utilized a thinking process.

Moreover, the technique was a simple one: instead of trying to evaluate step-by-step (process supervision), or doing a search of all possible answers (a la AlphaGo), DeepSeek encouraged the model to try several different answers at a time and then graded them according to the two reward functions.”

Sounds like traditional (versus test-based) teaching!

By @nextworddev - 26 days
The guy hedges and rehedges his point of view so much lol.

"Is it bad? Well, kind of. Actually, it depends. On the other hand, maybe."

By @dkrich - 27 days
This blog has taken an odd turn. It seems like he is invested in meta stock given the timing of this post and its focus on the stock market reaction to deepseek. That’s fine and I find no fault with it except that it’s hard to read the past few posts and not filter it through someone very bullish on meta no matter how the facts change.
By @blackoil - 27 days
Funny, how such a small company wiped off more than half a trillion from the valuations.

Hopefully, it shall motivate other small orgs/academic institutions to do research in LLM++.

By @libertarian1 - 27 days
Has anyone asked it yet what happened on Tiananmen Square in 1989?
By @grajaganDev - 27 days
Here we go again.

China schooling Silicon Valley at their own game. First a better scrolling dopamine app and now with a more efficient LLM.

Silicon Valley is a dinosaur at this point with only themselves to blame.