DeepSeek-R1-Distill-Qwen-1.5B Surpasses GPT-4o in certain benchmarks
DeepSeek launched its first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1, utilizing large-scale reinforcement learning. The models are open-sourced, with DeepSeek-R1-Distill-Qwen-32B achieving state-of-the-art results.
Read original articleDeepSeek has introduced its first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1, which utilize large-scale reinforcement learning (RL) without prior supervised fine-tuning (SFT). DeepSeek-R1-Zero has shown impressive reasoning capabilities but faced issues like repetition and readability. To improve upon this, DeepSeek-R1 incorporates cold-start data before RL, achieving performance on par with OpenAI's models across various tasks. The models have been open-sourced, including several distilled versions based on Llama and Qwen architectures, with DeepSeek-R1-Distill-Qwen-32B setting new benchmarks. The development pipeline for DeepSeek-R1 includes two RL stages to enhance reasoning patterns and two SFT stages for foundational capabilities. The research demonstrates that larger model reasoning can be distilled into smaller models, which perform better than small models trained solely through RL. The evaluation results indicate that the distilled models excel in various benchmarks, and the open-source nature of these models aims to benefit the research community. Users can access the models via the DeepSeek platform and run them locally with specific configurations to avoid common issues like repetition.
- DeepSeek-R1 models utilize reinforcement learning without prior supervised fine-tuning.
- The models have been open-sourced, including several distilled versions.
- DeepSeek-R1-Distill-Qwen-32B has achieved new state-of-the-art results.
- The development pipeline includes stages for improving reasoning patterns and aligning with human preferences.
- Distilled models demonstrate superior performance compared to smaller models trained through RL alone.
Related
DeepSeek-V3, ultra-large open-source AI, outperforms Llama and Qwen on launch
Chinese AI startup DeepSeek launched DeepSeek-V3, a 671 billion parameter model outperforming major competitors. It features cost-effective training, innovative architecture, and is available for testing and commercial use.
Interesting Interview with DeepSeek's CEO
Deepseek, a Chinese AI startup, has surpassed OpenAI's models in reasoning benchmarks, focusing on foundational AI technology, open-source models, and low-cost APIs, while aiming for artificial general intelligence.
Notes on the New Deepseek v3
Deepseek v3, a leading open-source model with 607 billion parameters, excels in reasoning and math tasks, outperforming competitors while being cost-effective, trained on 14.8 trillion data points for $6 million.
DeepSeek R1
DeepSeek-R1 is a new series of reasoning models utilizing large-scale reinforcement learning, featuring distilled models that outperform benchmarks. They are open-sourced, available for local use, and licensed under MIT.
Official DeepSeek R1 Now on Ollama
DeepSeek has launched its first generation of reasoning models, matching OpenAI's performance across tasks. Available in sizes from 1.5B to 70B parameters, they are MIT licensed for free use.
It still seems to me that these models are 'dumb' and often don't understand what I'm asking, where claude's intuition is much stronger.
I feel r1 14b even feels weaker than qwen 2.5 14b
Primary use-case is web technology / coding. Maybe I'm prompting it incorrectly?
So I would not put too much weight on how the models are doing on benchmarks.
Related
DeepSeek-V3, ultra-large open-source AI, outperforms Llama and Qwen on launch
Chinese AI startup DeepSeek launched DeepSeek-V3, a 671 billion parameter model outperforming major competitors. It features cost-effective training, innovative architecture, and is available for testing and commercial use.
Interesting Interview with DeepSeek's CEO
Deepseek, a Chinese AI startup, has surpassed OpenAI's models in reasoning benchmarks, focusing on foundational AI technology, open-source models, and low-cost APIs, while aiming for artificial general intelligence.
Notes on the New Deepseek v3
Deepseek v3, a leading open-source model with 607 billion parameters, excels in reasoning and math tasks, outperforming competitors while being cost-effective, trained on 14.8 trillion data points for $6 million.
DeepSeek R1
DeepSeek-R1 is a new series of reasoning models utilizing large-scale reinforcement learning, featuring distilled models that outperform benchmarks. They are open-sourced, available for local use, and licensed under MIT.
Official DeepSeek R1 Now on Ollama
DeepSeek has launched its first generation of reasoning models, matching OpenAI's performance across tasks. Available in sizes from 1.5B to 70B parameters, they are MIT licensed for free use.