December 31st, 2024

Interesting Interview with DeepSeek's CEO

Deepseek, a Chinese AI startup, has surpassed OpenAI's models in reasoning benchmarks, focusing on foundational AI technology, open-source models, and low-cost APIs, while aiming for artificial general intelligence.

Read original article

ConcernAdmirationSkepticism

Interesting Interview with DeepSeek's CEO

Deepseek is a Chinese AI startup that has gained attention for its R1 model, which outperformed OpenAI's offerings on various reasoning benchmarks. Founded by Liang Wenfeng, who previously led a successful hedge fund, Deepseek is fully funded by High-Flyer and focuses on foundational AI technology rather than commercial applications. The company has committed to open-sourcing its models and has initiated a price war in the Chinese AI market by offering low-cost API rates. Deepseek's innovative architectural advancements, such as multi-head latent attention and sparse mixture-of-experts, have significantly reduced inference costs, prompting major tech companies to lower their prices. The company aims to develop artificial general intelligence (AGI) and emphasizes research over commercialization. Liang's leadership style is characterized by a hands-on approach, prioritizing original innovation and collaboration with young talent. Despite its low profile, Deepseek is recognized for its potential to drive significant advancements in AI technology, challenging the notion that Chinese firms primarily focus on application rather than innovation. The company’s strategy and achievements have garnered respect in the global AI community, positioning it as a formidable player in the industry.

- Deepseek's R1 model has surpassed OpenAI's offerings in reasoning benchmarks.

- The startup is fully funded by High-Flyer and focuses on foundational AI technology.

- Deepseek has initiated a price war in the Chinese AI market with low-cost API rates.

- The company emphasizes research and open-source models over commercialization.

- Liang Wenfeng's leadership promotes original innovation and collaboration with young talent.

DeepSeek v2.5 – open-source LLM comparable to GPT-4o, but 95% less expensive

DeepSeek launched DeepSeek-V2.5, an advanced open-source model with a 128K context length, excelling in math and coding tasks, and offering competitive API pricing for developers.

DeepSeek-V3, ultra-large open-source AI, outperforms Llama and Qwen on launch

Chinese AI startup DeepSeek launched DeepSeek-V3, a 671 billion parameter model outperforming major competitors. It features cost-effective training, innovative architecture, and is available for testing and commercial use.

DeepSeek-V3

DeepSeek has launched DeepSeek-V3, which processes 60 tokens per second, features 671 billion parameters, and maintains open-source compatibility. Pricing changes will occur after February 8, 2024, with future updates planned.

DeepSeek's new AI model appears to be one of the best 'open' challengers yet

DeepSeek, a Chinese AI firm, launched DeepSeek V3, an open-source model with 671 billion parameters, excelling in text tasks and outperforming competitors, though limited by regulatory constraints.

Show HN: DeepSeek v3 – A 671B parameter AI Language Model

DeepSeek v3 is an advanced AI language model with 671 billion parameters, pre-trained on 14.8 trillion tokens, supporting a 128K context window, and available for commercial use across various hardware platforms.

AI: What people are saying

The discussion surrounding Deepseek's advancements in AI technology reveals several key themes and opinions.

Many commenters believe that restrictions on GPU exports have spurred innovation among Chinese developers, allowing Deepseek to achieve impressive results with fewer resources.
There are concerns about Deepseek's long-term competitiveness due to factors like trade wars, censorship, and the open-source nature of its models, which could allow others to replicate its success.
Some users express skepticism about the hype surrounding Deepseek, questioning its actual performance compared to established models like ChatGPT and Claude.
Comments highlight the potential implications of Deepseek's success on the global AI landscape, including the need for healthy competition and collaboration among AI companies.
There are mixed feelings about China's approach to AI research, with some praising its lack of restrictions while others express concern over ethical implications and the potential for technological dominance.

34 comments

By @lomkju - 4 months

I feel the GPU restrictions created an environment for Chinese Devs to be more innovative and do more with less.

Kudos to the deepseek team!

By @nsoonhui - 4 months

I find that the gushing around deepseek is fascinating to watch.

To me there are a few structural and fundamental reasons why deepseek can never outperform other models by a wide margin. On par maybe--as we reach the diminishing returns with our investment in the models, but not win by a wide margin.

1. The US trade war with china which will place deepseek compute availability at disadvantages, eventually, if we ever get to that.

2. China censorship which limits the deepseek data ingestion and output, to some degree.

3. Most importantly, deepseek is open source, which means that the other models are free to copy whatever secret source it has, eg: Whatever architecture that purportedly use less compute can easily be copied.

I've been using Gemini, chatgpt, deepseek and Claudie on regular basis. Deepseek is neither better or worse than others. But this says more about my own limited usage of LLM rather than the usefulness of the models.

I want to know exactly what makes everyone thinks that deepseek totally owns the LLM space? Do I miss anything?

PS: I am a Malaysian Chinese, so I am certainly not "a westerner who is jealous and fearful of the rise of China"

By @yellow_lead - 4 months

> Liang Wenfeng: We believe that as the economy develops, China should gradually become a contributor instead of freeriding. In the past 30+ years of the IT wave, we basically didn’t participate in real technological innovation. We’re used to Moore’s Law falling out of the sky, lying at home waiting 18 months for better hardware and software to emerge. That’s how the Scaling Law is being treated.

By @mentalgear - 4 months

Impressive to think about how DeepSeek achieved: ~ Parity with o1 and Claude with > 10x less resources. Better algorithms and approaches are what's needed for the next step of ML.

By @kjellsbells - 4 months

If you tell the world that eggs are awesome while denying other countries access to eggs, they discover ways to use less eggs and eventually realize they don't need eggs at all. Then you are stuck making Dennys breakfasts while the rest of the world is on to fine dining.

China has incredibly strong incentives to do the pure research needed to break the current GPU-or-else lock. I hope, for science' sake, we dont end up gunning down each others mathematicians on the streets of Vienna like certain nuclear physicists seem to go.

By @wiradikusuma - 4 months

I hope the competition among AI companies will continue to be healthy. Meaning they will keep sharing their techniques and papers, and we, as a whole, will be better off.

By @inSenCite - 4 months

"Before Deepseek, CEO Liang Wenfeng’s main venture was High-Flyer (幻方), a top 4 Chinese quantitative hedge fund last valued at $8 billion"

Seems wild that a top 4 quant hedge fund is only $8B?

By @fallmonkey - 4 months

Strangely, deepseek has been always a prominent name in open source LLM community since last year, with their repos and papers - https://github.com/deepseek-ai. Nothing of it is really quiet except that they probably burn 1% of marketing money compared to other China LLM players.

By @emporas - 4 months

Not personally surprised that a MoE model performs so well.

I used Mixtral a lot for coding Rust, and it had qualities no other model had except GPT 3.5 and later Claude Sonet. The funny thing is Mixtral was based on Llama 2 which was not trained on code that much.

DeepSeek v3: 671B parameters on total, and 37B activated sounds very good even though impossible to run locally.

Question if some people happen to know: For each query it activates just that many of parameters, 37B, and no more?

By @dumbmrblah - 4 months

Part of the reason their API is so cheap because they explicitly state they are going to train on your API data. Open AI and Claude say they won’t if you use their API (if you use ChatGPT that’s a different story). There are no free lunches.

By @orbital-decay - 4 months

This reminds me of PixArt-α. It's a diffusion model for image generation, that demonstrated that it's possible to train a SotA model on a ridiculously tiny budget ($28k).

By @murtio - 4 months

I'm starting to believe that these articles are commissioned. I asked their public model questions related to branding and marketing, instructing it to come up with a brand identity based off the apps functionalities. It kept talking to itself for more than 5 minutes in Chinese! Then finished up with a very bad answer!

By @suraci - 4 months

I'm wondering what impact this will have on NVDA

By @sroussey - 4 months

I’m surprised there is no word of combining old school symbolic AI with the new ML derived versions we enjoy today.

By @blackoil - 4 months

If is funny how a site that otherwise stays away from politics turns into reddit as soon as China is mentioned.

By @tw1984 - 4 months

it is good news for all software devs and AI researchers, we are taking the fruits of AI back from silicon mongers!

By @anshumankmr - 4 months

sadly not much from India on this front save for maybe Sarvam AI

By @timtom123 - 4 months

So much spam around this model. LocalLLaMA is stuffed with spam posts and even hacker news is getting spammed. Who has actually ran this model and verified performance? Does anyone know of a decent review from a trustworthy source?

By @kopirgan - 4 months

Interesting (mis)use of the word catfish

Not how we normally understand

By @robblbobbl - 4 months

Good there is already an EU competitor available

By @waldrews - 4 months

To this day, asking Deepseek "what model are you" typically gives the answer

"I'm an AI language model called ChatGPT, created by OpenAI. Specifically, I'm based on the GPT-4 architecture, which is designed to understand and generate human-like text based on the input I receive. My training data includes a wide range of information up until October 2023, and I can assist with answering questions, generating text, and much more. How can I help you today?"

this tells us something about using synthetic data to bootstrap new model. All those clauses in the terms of service about not using the model to develop competing UI? Yeah, good luck with that.

By @exe34 - 4 months

I have a question for the floor - given the worsening situation with technological unemployment, and the structural inability of capitalism to cope with it (who will buy the products when nobody has a job?), is it possible that China will be able to pivot to UBI and push on ahead? they have enormous control over the population and economy, so they might be able to change direction faster than the West?

By @LittleTimothy - 4 months

I'm getting so interested in the meta dynamics of this. The ability of the Chinese company to just openly state "we're working on this because it's interesting" rather than the US version "We want to wrap the world in puppies and hugs and we love you all and it's just a really embarrassing mistake I ended up buying myself a Koenigsegg and fired all the scientists from my non-profit board". To apply the same scepticism to the Chinese CEO - you can't threaten the monopoly of the Communist party so you have to pretend you're less capable than you are.

I don't think there's any doubt that China can produce some level of tech innovation, I do wonder if it can be sustained and exploited since we saw the damage that went on with Alibaba. Although maybe that's looking like a more reasonable approach when you see the danger of the opposite happening in the US.

By @jimbobthemighty - 4 months

Try asking about the Tiananmen Square massacre or why people compare Xi to Paddington Bear or even the failings of Xi... but it will happily criticise Trump.

By @Giorgi - 4 months

Ahh, yes another Chinese ChatGPT killer that is crappy.

By @cynicalsecurity - 4 months

China doesn't limit their AI research with so called safety and other concerns, but we do. Who is going to win? Somehow I don't think this is going to be us.

By @dgfitz - 4 months

Leading from the rear, with subterfuge and theft.

Keep going, China, you’re an inspiration to us all.

By @codedokode - 4 months

West is making a mistake again. They should not allow export of GPU and publish information on ML. Instead it would be wiser to become a monopoly and sell only AI services. If other countries learn how to do AI, nobody will need expensive Western services anymore.

Given that there are expectations that AI will be able to replace humans and increase manufacturing productivity, it should be well guarded unless you want your foreign competitors to increase the productivity too.

The wise strategy is to sell goods or services but never to sell tools that can be used to produce them, like industrial machines and robots.

DeepSeek v2.5 – open-source LLM comparable to GPT-4o, but 95% less expensive

DeepSeek launched DeepSeek-V2.5, an advanced open-source model with a 128K context length, excelling in math and coding tasks, and offering competitive API pricing for developers.

DeepSeek-V3, ultra-large open-source AI, outperforms Llama and Qwen on launch

DeepSeek-V3

DeepSeek's new AI model appears to be one of the best 'open' challengers yet

DeepSeek, a Chinese AI firm, launched DeepSeek V3, an open-source model with 671 billion parameters, excelling in text tasks and outperforming competitors, though limited by regulatory constraints.

Interesting Interview with DeepSeek's CEO

Related

DeepSeek v2.5 – open-source LLM comparable to GPT-4o, but 95% less expensive

DeepSeek-V3, ultra-large open-source AI, outperforms Llama and Qwen on launch

DeepSeek-V3

DeepSeek's new AI model appears to be one of the best 'open' challengers yet

Show HN: DeepSeek v3 – A 671B parameter AI Language Model

Related

DeepSeek v2.5 – open-source LLM comparable to GPT-4o, but 95% less expensive

DeepSeek-V3, ultra-large open-source AI, outperforms Llama and Qwen on launch

DeepSeek-V3

DeepSeek's new AI model appears to be one of the best 'open' challengers yet

Show HN: DeepSeek v3 – A 671B parameter AI Language Model