January 20th, 2025

DeepSeek-R1-Distill-Qwen-1.5B Surpasses GPT-4o in certain benchmarks

DeepSeek launched its first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1, utilizing large-scale reinforcement learning. The models are open-sourced, with DeepSeek-R1-Distill-Qwen-32B achieving state-of-the-art results.

Read original article

DeepSeek-R1-Distill-Qwen-1.5B Surpasses GPT-4o in certain benchmarks

DeepSeek has introduced its first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1, which utilize large-scale reinforcement learning (RL) without prior supervised fine-tuning (SFT). DeepSeek-R1-Zero has shown impressive reasoning capabilities but faced issues like repetition and readability. To improve upon this, DeepSeek-R1 incorporates cold-start data before RL, achieving performance on par with OpenAI's models across various tasks. The models have been open-sourced, including several distilled versions based on Llama and Qwen architectures, with DeepSeek-R1-Distill-Qwen-32B setting new benchmarks. The development pipeline for DeepSeek-R1 includes two RL stages to enhance reasoning patterns and two SFT stages for foundational capabilities. The research demonstrates that larger model reasoning can be distilled into smaller models, which perform better than small models trained solely through RL. The evaluation results indicate that the distilled models excel in various benchmarks, and the open-source nature of these models aims to benefit the research community. Users can access the models via the DeepSeek platform and run them locally with specific configurations to avoid common issues like repetition.

- DeepSeek-R1 models utilize reinforcement learning without prior supervised fine-tuning.

- The models have been open-sourced, including several distilled versions.

- DeepSeek-R1-Distill-Qwen-32B has achieved new state-of-the-art results.

- The development pipeline includes stages for improving reasoning patterns and aligning with human preferences.

- Distilled models demonstrate superior performance compared to smaller models trained through RL alone.

DeepSeek-V3, ultra-large open-source AI, outperforms Llama and Qwen on launch

Chinese AI startup DeepSeek launched DeepSeek-V3, a 671 billion parameter model outperforming major competitors. It features cost-effective training, innovative architecture, and is available for testing and commercial use.

Interesting Interview with DeepSeek's CEO

Deepseek, a Chinese AI startup, has surpassed OpenAI's models in reasoning benchmarks, focusing on foundational AI technology, open-source models, and low-cost APIs, while aiming for artificial general intelligence.

Notes on the New Deepseek v3

Deepseek v3, a leading open-source model with 607 billion parameters, excels in reasoning and math tasks, outperforming competitors while being cost-effective, trained on 14.8 trillion data points for $6 million.

DeepSeek R1

DeepSeek-R1 is a new series of reasoning models utilizing large-scale reinforcement learning, featuring distilled models that outperform benchmarks. They are open-sourced, available for local use, and licensed under MIT.

Official DeepSeek R1 Now on Ollama

DeepSeek has launched its first generation of reasoning models, matching OpenAI's performance across tasks. Available in sizes from 1.5B to 70B parameters, they are MIT licensed for free use.

3 comments

By @Fergusonb - 4 months

These benchmarks have even the small models absolutely demolishing Sonnet-3.5, which doesn't reflect my subjective experience.

It still seems to me that these models are 'dumb' and often don't understand what I'm asking, where claude's intuition is much stronger.

I feel r1 14b even feels weaker than qwen 2.5 14b

Primary use-case is web technology / coding. Maybe I'm prompting it incorrectly?

By @buyucu - 4 months

OpenAI was caught gaming benchmarks recently with FrontierMath. Just (yet another) sign that benchmarks are very flawed and everyone is training on them.

So I would not put too much weight on how the models are doing on benchmarks.

By @amelius - 4 months

Where can we read some genuine non-cherrypicked conversations with this model?

DeepSeek-V3, ultra-large open-source AI, outperforms Llama and Qwen on launch