Interesting Interview with DeepSeek's CEO
Deepseek, a Chinese AI startup, has surpassed OpenAI's models in reasoning benchmarks, focusing on foundational AI technology, open-source models, and low-cost APIs, while aiming for artificial general intelligence.
Read original articleDeepseek is a Chinese AI startup that has gained attention for its R1 model, which outperformed OpenAI's offerings on various reasoning benchmarks. Founded by Liang Wenfeng, who previously led a successful hedge fund, Deepseek is fully funded by High-Flyer and focuses on foundational AI technology rather than commercial applications. The company has committed to open-sourcing its models and has initiated a price war in the Chinese AI market by offering low-cost API rates. Deepseek's innovative architectural advancements, such as multi-head latent attention and sparse mixture-of-experts, have significantly reduced inference costs, prompting major tech companies to lower their prices. The company aims to develop artificial general intelligence (AGI) and emphasizes research over commercialization. Liang's leadership style is characterized by a hands-on approach, prioritizing original innovation and collaboration with young talent. Despite its low profile, Deepseek is recognized for its potential to drive significant advancements in AI technology, challenging the notion that Chinese firms primarily focus on application rather than innovation. The company’s strategy and achievements have garnered respect in the global AI community, positioning it as a formidable player in the industry.
- Deepseek's R1 model has surpassed OpenAI's offerings in reasoning benchmarks.
- The startup is fully funded by High-Flyer and focuses on foundational AI technology.
- Deepseek has initiated a price war in the Chinese AI market with low-cost API rates.
- The company emphasizes research and open-source models over commercialization.
- Liang Wenfeng's leadership promotes original innovation and collaboration with young talent.
Related
DeepSeek v2.5 – open-source LLM comparable to GPT-4o, but 95% less expensive
DeepSeek launched DeepSeek-V2.5, an advanced open-source model with a 128K context length, excelling in math and coding tasks, and offering competitive API pricing for developers.
DeepSeek-V3, ultra-large open-source AI, outperforms Llama and Qwen on launch
Chinese AI startup DeepSeek launched DeepSeek-V3, a 671 billion parameter model outperforming major competitors. It features cost-effective training, innovative architecture, and is available for testing and commercial use.
DeepSeek-V3
DeepSeek has launched DeepSeek-V3, which processes 60 tokens per second, features 671 billion parameters, and maintains open-source compatibility. Pricing changes will occur after February 8, 2024, with future updates planned.
DeepSeek's new AI model appears to be one of the best 'open' challengers yet
DeepSeek, a Chinese AI firm, launched DeepSeek V3, an open-source model with 671 billion parameters, excelling in text tasks and outperforming competitors, though limited by regulatory constraints.
Show HN: DeepSeek v3 – A 671B parameter AI Language Model
DeepSeek v3 is an advanced AI language model with 671 billion parameters, pre-trained on 14.8 trillion tokens, supporting a 128K context window, and available for commercial use across various hardware platforms.
- Many commenters believe that restrictions on GPU exports have spurred innovation among Chinese developers, allowing Deepseek to achieve impressive results with fewer resources.
- There are concerns about Deepseek's long-term competitiveness due to factors like trade wars, censorship, and the open-source nature of its models, which could allow others to replicate its success.
- Some users express skepticism about the hype surrounding Deepseek, questioning its actual performance compared to established models like ChatGPT and Claude.
- Comments highlight the potential implications of Deepseek's success on the global AI landscape, including the need for healthy competition and collaboration among AI companies.
- There are mixed feelings about China's approach to AI research, with some praising its lack of restrictions while others express concern over ethical implications and the potential for technological dominance.
Kudos to the deepseek team!
To me there are a few structural and fundamental reasons why deepseek can never outperform other models by a wide margin. On par maybe--as we reach the diminishing returns with our investment in the models, but not win by a wide margin.
1. The US trade war with china which will place deepseek compute availability at disadvantages, eventually, if we ever get to that.
2. China censorship which limits the deepseek data ingestion and output, to some degree.
3. Most importantly, deepseek is open source, which means that the other models are free to copy whatever secret source it has, eg: Whatever architecture that purportedly use less compute can easily be copied.
I've been using Gemini, chatgpt, deepseek and Claudie on regular basis. Deepseek is neither better or worse than others. But this says more about my own limited usage of LLM rather than the usefulness of the models.
I want to know exactly what makes everyone thinks that deepseek totally owns the LLM space? Do I miss anything?
PS: I am a Malaysian Chinese, so I am certainly not "a westerner who is jealous and fearful of the rise of China"
China has incredibly strong incentives to do the pure research needed to break the current GPU-or-else lock. I hope, for science' sake, we dont end up gunning down each others mathematicians on the streets of Vienna like certain nuclear physicists seem to go.
Seems wild that a top 4 quant hedge fund is only $8B?
I used Mixtral a lot for coding Rust, and it had qualities no other model had except GPT 3.5 and later Claude Sonet. The funny thing is Mixtral was based on Llama 2 which was not trained on code that much.
DeepSeek v3: 671B parameters on total, and 37B activated sounds very good even though impossible to run locally.
Question if some people happen to know: For each query it activates just that many of parameters, 37B, and no more?
Not how we normally understand
"I'm an AI language model called ChatGPT, created by OpenAI. Specifically, I'm based on the GPT-4 architecture, which is designed to understand and generate human-like text based on the input I receive. My training data includes a wide range of information up until October 2023, and I can assist with answering questions, generating text, and much more. How can I help you today?"
this tells us something about using synthetic data to bootstrap new model. All those clauses in the terms of service about not using the model to develop competing UI? Yeah, good luck with that.
I don't think there's any doubt that China can produce some level of tech innovation, I do wonder if it can be sustained and exploited since we saw the damage that went on with Alibaba. Although maybe that's looking like a more reasonable approach when you see the danger of the opposite happening in the US.
Keep going, China, you’re an inspiration to us all.
Given that there are expectations that AI will be able to replace humans and increase manufacturing productivity, it should be well guarded unless you want your foreign competitors to increase the productivity too.
The wise strategy is to sell goods or services but never to sell tools that can be used to produce them, like industrial machines and robots.
Related
DeepSeek v2.5 – open-source LLM comparable to GPT-4o, but 95% less expensive
DeepSeek launched DeepSeek-V2.5, an advanced open-source model with a 128K context length, excelling in math and coding tasks, and offering competitive API pricing for developers.
DeepSeek-V3, ultra-large open-source AI, outperforms Llama and Qwen on launch
Chinese AI startup DeepSeek launched DeepSeek-V3, a 671 billion parameter model outperforming major competitors. It features cost-effective training, innovative architecture, and is available for testing and commercial use.
DeepSeek-V3
DeepSeek has launched DeepSeek-V3, which processes 60 tokens per second, features 671 billion parameters, and maintains open-source compatibility. Pricing changes will occur after February 8, 2024, with future updates planned.
DeepSeek's new AI model appears to be one of the best 'open' challengers yet
DeepSeek, a Chinese AI firm, launched DeepSeek V3, an open-source model with 671 billion parameters, excelling in text tasks and outperforming competitors, though limited by regulatory constraints.
Show HN: DeepSeek v3 – A 671B parameter AI Language Model
DeepSeek v3 is an advanced AI language model with 671 billion parameters, pre-trained on 14.8 trillion tokens, supporting a 128K context window, and available for commercial use across various hardware platforms.