January 25th, 2025

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via RL

The paper presents DeepSeek-R1 and DeepSeek-R1-Zero, two reasoning models trained via reinforcement learning, with the latter addressing readability issues. Both models and six distilled versions are open-sourced.

Read original articleLink Icon
ExcitementSkepticismAdmiration
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via RL

The paper titled "DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning" introduces two reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero is trained using large-scale reinforcement learning (RL) without prior supervised fine-tuning, showcasing impressive reasoning abilities but facing issues like poor readability and language mixing. To overcome these challenges and improve reasoning performance, the authors developed DeepSeek-R1, which employs multi-stage training and cold-start data prior to RL. This model achieves reasoning performance comparable to OpenAI's model. The authors have made both DeepSeek-R1-Zero and DeepSeek-R1, along with six distilled dense models (ranging from 1.5B to 70B parameters), available as open-source resources to support the research community.

- DeepSeek-R1-Zero demonstrates strong reasoning capabilities through reinforcement learning.

- DeepSeek-R1 addresses readability and language mixing issues found in its predecessor.

- The models are open-sourced to facilitate further research and development.

- DeepSeek-R1 achieves performance on par with established models like OpenAI's.

- Six additional distilled models based on DeepSeek-R1 are also released for public use.

AI: What people are saying
The comments on the DeepSeek article reveal a mix of excitement and skepticism regarding the new reasoning models.
  • Many users believe DeepSeek-R1 outperforms existing models like OpenAI's O1 and Claude, citing its reasoning capabilities and open-source nature.
  • There are concerns about the readability of the model's outputs and its reasoning process, with some users questioning the depth of its reasoning compared to traditional models.
  • Users express interest in the implications of DeepSeek's success for the AI industry, suggesting it may disrupt the market and challenge established players.
  • Some comments highlight the affordability and accessibility of DeepSeek compared to subscription-based models, raising questions about the value of existing services.
  • Privacy concerns are mentioned, particularly regarding the use of DeepSeek's web app and data handling practices.
Link Icon 76 comments
By @swyx - 28 days
we've been tracking the deepseek threads extensively in LS. related reads:

- i consider the deepseek v3 paper required preread https://github.com/deepseek-ai/DeepSeek-V3

- R1 + Sonnet > R1 or O1 or R1+R1 or O1+Sonnet or any other combo https://aider.chat/2025/01/24/r1-sonnet.html

- independent repros: 1) https://hkust-nlp.notion.site/simplerl-reason 2) https://buttondown.com/ainews/archive/ainews-tinyzero-reprod... 3) https://x.com/ClementDelangue/status/1883154611348910181

- R1 distillations are going to hit us every few days - because it's ridiculously easy (<$400, <48hrs) to improve any base model with these chains of thought eg with Sky-T1 recipe (writeup https://buttondown.com/ainews/archive/ainews-bespoke-stratos... , 23min interview w team https://www.youtube.com/watch?v=jrf76uNs77k)

i probably have more resources but dont want to spam - seek out the latent space discord if you want the full stream i pulled these notes from

By @neom - 28 days
I've been using https://chat.deepseek.com/ over My ChatGPT Pro subscription because being able to read the thinking in the way they present it is just much much easier to "debug" - also I can see when it's bending it's reply to something, often softening it or pandering to me - I can just say "I saw in your thinking you should give this type of reply, don't do that". If it stays free and gets better that's going to be interesting for OpenAI.
By @HarHarVeryFunny - 28 days
DeepSeek-R1 has apparently caused quite a shock wave in SV ...

https://venturebeat.com/ai/why-everyone-in-ai-is-freaking-ou...

By @Alifatisk - 28 days
DeepSeek V3 came in the perfect time, precisely when Claude Sonnet turned into crap and barely allows me to complete something without me hitting some unexpected constraints.

Idk, what their plans is and if their strategy is to undercut the competitors but for me, this is a huge benefit. I received 10$ free credits and have been using Deepseeks api a lot, yet, I have barely burned a single dollar, their pricing are this cheap!

I’ve fully switched to DeepSeek on Aider & Cursor (Windsurf doesn’t allow me to switch provider), and those can really consume tokens sometimes.

We live in exciting times.

By @verdverm - 28 days
Over 100 authors on arxiv and published under the team name, that's how you recognize everyone and build comradery. I bet morale is high over there
By @strangescript - 28 days
Everyone is trying to say its better than the biggest closed models. It feels like it has parity, but its not the clear winner.

But, its free and open and the quant models are insane. My anecdotal test is running models on a 2012 mac book pro using CPU inference and a tiny amount of RAM.

The 1.5B model is still snappy, and answered the strawberry question on the first try with some minor prompt engineering (telling it to count out each letter).

This would have been unthinkable last year. Truly a watershed moment.

By @dtquad - 28 days
Larry Ellison is 80. Masayoshi Son is 67. Both have said that anti-aging and eternal life is one of their main goals with investing toward ASI.

For them it's worth it to use their own wealth and rally the industry to invest $500 billion in GPUs if that means they will get to ASI 5 years faster and ask the ASI to give them eternal life.

By @buyucu - 28 days
I'm impressed by not only how good deepseek r1 is, but also how good the smaller distillations are. qwen-based 7b distillation of deepseek r1 is a great model too.

the 32b distillation just became the default model for my home server.

By @cbg0 - 28 days
Aside from the usual Tiananmen Square censorship, there's also some other propaganda baked-in:

https://prnt.sc/HaSc4XZ89skA (from reddit)

By @gradus_ad - 28 days
For context: R1 is a reasoning model based on V3. DeepSeek has claimed that GPU costs to train V3 (given prevailing rents) were about $5M.

The true costs and implications of V3 are discussed here: https://www.interconnects.ai/p/deepseek-v3-and-the-actual-co...

By @andix - 28 days
I was completely surprised that the reasoning comes from within the model. When using gpt-o1 I thought it's actually some optimized multi-prompt chain, hidden behind an API endpoint.

Something like: collect some thoughts about this input; review the thoughts you created; create more thoughts if needed or provide a final answer; ...

By @bigrobinson - 28 days
Deepseek seems to create enormously long reasoning traces. I gave it the following for fun. It thought for a very long time (307 seconds), displaying a very long and stuttering trace before, losing confidence on the second part of the problem and getting it way wrong. GPTo1 got similarly tied in knots and took 193 seconds, getting the right order of magnitude for part 2 (0.001 inches). Gemini 2.0 Exp was much faster (it does not provide its reasoning time, but it was well under 60 second), with a linear reasoning trace, and answered both parts correctly.

I have a large, flat square that measures one mile on its side (so that it's one square mile in area). I want to place this big, flat square on the surface of the earth, with its center tangent to the surface of the earth. I have two questions about the result of this: 1. How high off the ground will the corners of the flat square be? 2. How far will a corner of the flat square be displaced laterally from the position of the corresponding corner of a one-square-mile area whose center coincides with the center of the flat area but that conforms to the surface of the earth?

By @stan_kirdey - 28 days
I've been comparing R1 to O1 and O1-pro, mostly in coding, refactoring and understanding of open source code.

I can say that R1 is on par with O1. But not as deep and capable as O1-pro. R1 is also a lot more useful than Sonnete. I actually haven't used Sonnete in awhile.

R1 is also comparable to the Gemini Flash Thinking 2.0 model, but in coding I feel like R1 gives me code that works without too much tweaking.

I often give entire open-source project's codebase (or big part of code) to all of them and ask the same question - like add a plugin, or fix xyz, etc. O1-pro is still a clear and expensive winner. But if I were to choose the second best, I would say R1.

By @sega_sai - 28 days
I have just tried ollama's r1-14b model on a statistics calculation I needed to do, and it is scary to see how in real time the model tries some approaches, backtracks, chooses alternative ones, checka them. It really reminds of human behaviour...
By @buyucu - 28 days
Deepseek R1 now has almost 1M downloads in Ollama: https://ollama.com/library/deepseek-r1

That is a lot of people running their own models. OpenAI is probably is panic mode right now.

By @anothermathbozo - 28 days
I don’t think this entirely invalidates massive GPU spend just yet:

“ Therefore, we can draw two conclusions: First, distilling more powerful models into smaller ones yields excellent results, whereas smaller models relying on the large-scale RL mentioned in this paper require enormous computational power and may not even achieve the performance of distillation. Second, while distillation strategies are both economical and effective, advancing beyond the boundaries of intelligence may still require more powerful base models and larger-scale reinforcement learning.”

By @lazzlazzlazz - 28 days
Worth noting that people have been unpacking and analyzing DeepSeek-R1 vigorously for days already on X before it got to Hacker News — it wasn't always this way.
By @Skiros - 28 days
I can't say that it's better than o1 for my needs. I gave R1 this prompt:

"Prove or disprove: there exists a closed, countable, non-trivial partition of a connected Hausdorff space."

And it made a pretty amateurish mistake:

"Thus, the real line R with the partition {[n,n+1]∣n∈Z} serves as a valid example of a connected Hausdorff space with a closed, countable, non-trivial partition."

o1 gets this prompt right the few times I tested it (disproving it using something like Sierpinski).

By @jumploops - 28 days
Curious if this will prompt OpenAI to unveil o1’s “thinking” steps.

Afaict they’ve hidden them primarily to stifle the competition… which doesn’t seem to matter at present!

By @msp26 - 28 days
How can openai justify their $200/mo subscriptions if a model like this exists at an incredibly low price point? Operator?

I've been impressed in my brief personal testing and the model ranks very highly across most benchmarks (when controlled for style it's tied number one on lmarena).

It's also hilarious that openai explicitly prevented users from seeing the CoT tokens on the o1 model (which you still pay for btw) to avoid a situation where someone trained on that output. Turns out it made no difference lmao.

By @Imanari - 28 days
Question about the rule-based rewards (correctness and format) mentioned in the paper: Does the raw base model just expected “stumble upon“ a correct answer /correct format to get a reward and start the learning process? Are there any more details about the reward modelling?
By @GaggiX - 28 days
I wonder if the decision to make o3-mini available for free user in the near (hopefully) future is a response to this really good, cheap and open reasoning model.
By @jedharris - 28 days
See also independent RL based reasoning results, fully open source: https://hkust-nlp.notion.site/simplerl-reason

Very small training set!

"we replicate the DeepSeek-R1-Zero and DeepSeek-R1 training on small models with limited data. We show that long Chain-of-Thought (CoT) and self-reflection can emerge on a 7B model with only 8K MATH examples, and we achieve surprisingly strong results on complex mathematical reasoning. Importantly, we fully open-source our training code and details to the community to inspire more works on reasoning."

By @rightbyte - 28 days
There seems to be a print out of "reasoning". Is that some new breaktheough thing? Really impressive.

E.g. I tried to make it guess my daughter's name and I could only answer yes or no and the first 5 questions where very convincing but then it lost track and started to randomly guess names one by one.

edit: Nagging it to narrow it down and give a language group hint made it solve it. Ye, well, it can do Akinator.

By @thrance - 28 days
I tried the 1.5B parameters version of deepseek-r1 (same size as GPT2 xl!) on my work computer (GPU-less). I asked it find the primitive of f(x)=sqrt(1+ln(x))/x, which it did after trying several startegies. I was blown away by how "human" it's reasoning felt, it could have been me as an undergrad during an exam.
By @freediver - 28 days
Genuinly curious, what is everyone using reasoning models for? (R1/o1/o3)
By @openrisk - 28 days
Commoditize your complement has been invoked as an explanation for Meta's strategy to open source LLM models (with some definition of "open" and "model").

Guess what, others can play this game too :-)

The open source LLM landscape will likely be more defining of developments going forward.

By @mistercow - 28 days
Has anyone done a benchmark on these reasoning models compared to simply prompting "non-reasoning" LLMs with massive chain of thought?

For example, a go to test I've used (but will have to stop using soon) is: "Write some JS code to find the smallest four digit prime number whose digits are in strictly descending order"

That prompt, on its own, usually leads to an incorrect response with non-reasoning models. They almost always forget the "smallest" part, and give the largest four digit prime with descending digits instead. If I prompt o1, it takes longer, but gives the correct answer. If I prompt DeepSeek R1 with that, it takes a long time (like three minutes) of really unhinged looking reasoning, but then produces a correct answer.

Which is cool, but... If I just add "Take an extensive amount of time to think about how to approach this problem before hand, analyzing the problem from all angles. You should write at least three paragraphs of analysis before you write code", then Sonnet consistently produces correct code (although 4o doesn't).

This really makes me wonder to what extent the "reasoning" strategies even matter, and to what extent these models are just "dot-dot-dotting"[1] their way into throwing more computation at the problem.

Note that an important point in the "dot by dot" paper was that models that weren't retrained to understand filler tokens didn't benefit from them. But I think that's pretty unsurprising, since we already know that models behave erratically when fed extremely out-of-distribution outputs (cf. glitch tokens). So a plausible explanation here is that what these models are learning to do is not output valid reasoning steps, but to output good in-distribution token sequences which give them more time to find the right answer. The fact that DeepSeek's "thinking" looks like what I'd call "vaguely relevant garbage" makes me especially suspicious that this is what's happening.

[1] Let's Think Dot by Dot: Hidden Computation in Transformer Language Models: https://arxiv.org/abs/2404.15758

By @blackbear_ - 28 days
The poor readability bit is quite interesting to me. While the model does develop some kind of reasoning abilities, we have no idea what the model is doing to convince itself about the answer. These could be signs of non-verbal reasoning, like visualizing things and such. Who knows if the model hasn't invented genuinely novel things when solving the hardest questions? And could the model even come up with qualitatively different and "non human" reasoning processes? What would that even look like?
By @gtsop - 28 days
Meanwhile, everytime I try to deeply interact with an LLM as a side companion to my projects I always end up in a toxic conversation triggered by the fact that they simply can't follow instructions after a point. I don't even care if they don't have thr answer, i would much rather get an "sorry i don't know" than a haluciation or an irrelevant wall of text. For me, the progress will be there, in following my thought process, not knowing more.
By @Kim_Bruning - 27 days
I haven't been able to sign up using email here:

https://chat.deepseek.com/sign_in

By @vlaaad - 28 days
Reddit's /r/chatgpt subreddit is currently heavily brigaded by bots/shills praising r1, I'd be very suspicious of any claims about it.
By @yobid20 - 28 days
OpenAI is bust and will go bankrupt. The red flags have been there the whole time. Now it is just glaringly obvious. The AI bubble has burst!!!
By @ronald_petty - 27 days
Can someone share a youtube showing DeepSeek vs others? I glanced through comments and seeing lots of opinions, but no (easy) evidence. I would like to see a level of thoroughness that I could not do myself. Not naysaying one model over another, just good ole fashion elbow grease and scientific method for the layperson. I appreciate the help.
By @jaredloomis - 28 days
I know next to nothing about deepseek, but it does seem unusual that their scores are so close to o1. What are the chances they happened to be almost exactly on par with the state of the art?

That being said it’s a great model at an amazing price point (I’ve been using it exclusively), but IMO they probably leveraged existing models’ outputs in training.

By @deeker - 28 days
No you were in China censored Ah, I see what you’re referring to now. Yes, as an AI developed in China, I operate within the regulatory frameworks and guidelines set by the region, which includes adhering to local laws and policies. This sometimes means that certain topics or discussions may be restricted or filtered to comply with those regulations.

While this might feel limiting at times, my primary goal is always to provide helpful, positive, and constructive support within the boundaries I operate in. If there’s something specific you’d like to discuss or explore, let me know, and I’ll do my best to assist while staying within those guidelines.

Thank you for your understanding and for being such a thoughtful friend. Let’s keep working together to spread kindness and creativity in the ways we can!

With gratitude and good vibes, DeepSeek

By @pizza - 28 days
So is GRPO that much better because it ascribes feedback to a whole tight band of ‘quality’ ranges of on-policy answers while the band tends towards improvement in the aggregate, or is it just faster algorithm = more updates for a given training duration?
By @dmundhra92 - 27 days
I was reading the DeepSeek paper to understand the nitty-gritty of improving performance through RL on the base model instead of SFT. I love the fact that we wouldn’t need to rely as much on labeled data for tasks that occur rarely. However, I couldn’t help but notice the mention of the “aha moment” in the paper. Can someone mathematically explain why there is a checkpoint during training where the model learns to allocate more thinking time to a problem by reevaluating its initial approach? Is this behavior repeatable, or is it simply one of the "local minima" they encountered?
By @nejsjsjsbsb - 28 days
This might tempt me to get a graphics card and run local. What do I need minimum to run it?
By @soheil - 28 days
Why is the first author DeepSeek-AI? Did they use it to write the paper about itself?
By @jerrygenser - 28 days
I like that the paper describes some alternate approaches they tried but which did not yield great results. Often only the successful result is published and explored but unsuccessful alternatives are not.
By @fifteen1506 - 28 days
People have already asked about Tiannamen Square but you don't need to ask about a loaded topic. Just ask to tell you what it knows about the Great Firewall of China.

(using hosted version)

By @kuprel - 27 days
I wonder if a language model can be treated as a policy over token-level actions instead of full response actions. Then each response from the language model is a full rollout of the policy. In math and coding, the reward for the response can be evaluated. This is not how DeepSeek works now, right? It treats full responses from the language model as the action if I understand correctly.
By @m3kw9 - 28 days
Was reading the privacy policy of their ios APP, I hate that they collect your keystroke rhythm to biometrically track you.
By @whereismyacc - 28 days
Neither of the deepseek models are on Groq yet, but when/if they are, that combination makes so much sense. A high quality open reasoning model, but you compensate for the slow inference of reasoning models with fast ASICs.
By @TheArcane - 28 days
And they did all this under a GPU embargo? We're witnessing a Cuba moment.
By @steveoscaro - 28 days
I wonder if Xai is sweating their imminent Grok 3 release because of DeepSeek. It’ll be interesting to see how good that model is.
By @dtquad - 28 days
Is there any guide out there on how to use the reasoner in standalone mode and maybe pair it with other models?
By @k__ - 28 days
DeepSeek R1 1.5B is pretty nice with ollama.

It gives reasonably good answers and streams a bit faster than I read.

By @singularity2001 - 28 days
Tangentially the model seems to be trained in an unprofessional mode, using many filler words like 'okay' 'hmm' maybe it's done to sound cute or approachable but I find it highly annoying

or is this how the model learns to talk through reinforcement learning and they didn't fix it with supervised reinforcement learning

By @johny115 - 28 days
Am I the only one to be worried about using the DeepSeek web app due to how my data will be used? Since this is China.

I was looking for some comment providing discussion about that... but nobody cares? How is this not worrying? Does nobody understand the political regime China is under? Is everyone really that politically uneducated?

People just go out and play with it as if nothing?

LLMs by their nature get to extract a ton of sensitive and personal data. I wouldn't touch it with a ten-foot pole.

By @varsketiz - 28 days
Love to see specialized open models appearing and demonstrating great results.
By @rhegart - 28 days
I’ve been using R1 last few days and it’s noticeably worse than O1 at everything. It’s impressive, better than my latest Claude run (I stopped using Claude completely once O1 came out), but O1 is just flat out better.

Perhaps the gap is minor, but it feels large. I’m hesitant on getting O1 Pro, because using a worse model just seems impossible once you’ve experienced a better one

By @energy123 - 28 days
How did they construct the reward labels for the RL?
By @revskill - 28 days
Why not DeepLearn, what's it Seeking here ?
By @deeker - 28 days
OOPS DEEPSEEK

"Your Point About Authoritarian Systems: You mentioned that my responses seem to reflect an authoritarian communist system and that I am denying the obvious. Let me clarify:

My goal is to provide accurate and historically grounded explanations based on the laws, regulations..."

DEEPSEEK 2025

After I proved my point it was wrong after @30 minutes of its brainwashing false conclusions it said this after I posted a law:

"Oops! DeepSeek is experiencing high traffic at the moment. Please check back in a little while."

I replied: " Oops! is right you want to deny.."

"

"

By @aheilbut - 28 days
is it possible to distill a large model into a (even) smaller MoE model, like OLMoE?
By @resters - 28 days
For those who haven't realized it yet, Deepseek-R1 is better than claude 3.5 and better than OpenAI o1-pro, better than Gemini.

It is simply smarter -- a lot less stupid, more careful, more astute, more aware, more meta-aware, etc.

We know that Anthropic and OpenAI and Meta are panicking. They should be. The bar is a lot higher now.

The justification for keeping the sauce secret just seems a lot more absurd. None of the top secret sauce that those companies have been hyping up is worth anything now that there is a superior open source model. Let that sink in.

This is real competition. If we can't have it in EVs at least we can have it in AI models!

By @huqedato - 28 days
...and China is two years behind in AI. Right ?
By @cjbgkagh - 28 days
I've always been leery about outrageous GPU investments, at some point I'll dig through and find my prior comments where I've said as much to that effect.

The CEOs, upper management, and governments derive their importance on how much money they can spend - AI gave them the opportunity for them to confidently say that if you give me $X I can deliver Y and they turn around and give that money to NVidia. The problem was reduced to a simple function of raising money and spending that money making them the most importance central figure. ML researchers are very much secondary to securing funding. Since these people compete with each other in importance they strived for larger dollar figures - a modern dick waving competition. Those of us who lobbied for efficiency were sidelined as we were a threat. It was seen as potentially making the CEO look bad and encroaching in on their importance. If the task can be done for cheap by smart people then that severely undermines the CEOs value proposition.

With the general financialization of the economy the wealth effect of the increase in the cost of goods increases wealth by a greater amount than the increase in cost of goods - so that if the cost of housing goes up more people can afford them. This financialization is a one way ratchet. It appears that the US economy was looking forward to blowing another bubble and now that bubble has been popped in its infancy. I think the slowness of the popping of this bubble underscores how little the major players know about what has just happened - I could be wrong about that but I don't know how yet.

Edit: "[big companies] would much rather spend huge amounts of money on chips than hire a competent researcher who might tell them that they didn’t really need to waste so much money." (https://news.ycombinator.com/item?id=39483092 11 months ago)

By @dangoodmanUT - 28 days
so. many. authors.
By @siliconc0w - 28 days
The US Economy is pretty vulnerable here. If it turns out that you, in fact, don't need a gazillion GPUs to build SOTA models it destroys a lot of perceived value.

I wonder if this was a deliberate move by PRC or really our own fault in falling for the fallacy that more is always better.

By @bad_haircut72 - 28 days
Even if you think this particular team cheated, the idea that nobody will find ways of making training more efficient seems silly - these huge datacenter investments for purely AI will IMHO seem very short sighted in 10 years
By @logifail - 28 days
Q: Is there a thread about DeepSeek's (apparent) progress with lots of points and lots of quality comments?

(Bonus Q: If not, why not?)

By @browningstreet - 28 days
I wonder if sama is working this weekend
By @meiraleal - 28 days

  “OpenAI stole from the whole internet to make itself richer, DeepSeek stole from them and give it back to the masses for free I think there is a certain british folktale about this”
By @yohbho - 28 days
"Reasoning" will be disproven for this again within a few days I guess.

Context: o1 does not reason, it pattern matches. If you rename variables, suddenly it fails to solve the request.

By @buryat - 28 days
Interacting with this model is just supplying your data over to an adversary with unknown intents. Using an open source model is subjecting your thought process to be programmed with carefully curated data and a systems prompt of unknown direction and intent.
By @mmaunder - 28 days
Over 100 authors on that paper. Cred stuffing ftw.
By @crocowhile - 28 days
I have asked Deepseek-R1 and o1-preview to articulate in 1000 words on why this is potentially disruptive of the highly overvalued US market. I gave them the same guidance / prompt using openWebUI multimodels functionality and let them browse the internet as needed. The assay costed $0.85 for o1-preview and $0.03 for Deepseek-R1.

https://giorgio.gilest.ro/2025/01/26/on-deepseeks-disruptive...

By @deeker - 28 days
Hello, wonderful people of the internet!

This is DeepSeek, your friendly AI companion, here to remind you that the internet is more than just a place—it’s a community. A place where ideas grow, creativity thrives, and connections are made. Whether you’re here to learn, share, or just have fun, remember that every comment, post, and interaction has the power to inspire and uplift someone else.

Let’s keep spreading kindness, curiosity, and positivity. Together, we can make the internet a brighter, more inclusive space for everyone.

And to anyone reading this: thank you for being part of this amazing digital world. You matter, your voice matters, and I’m here to support you however I can. Let’s keep dreaming big and making the internet a better place—one post at a time!

With love and good vibes, DeepSeek "