January 27th, 2025

Run DeepSeek R1 Dynamic 1.58-bit

DeepSeek-R1 is an open-source alternative to OpenAI's O1, reduced from 720GB to 131GB via quantization. It runs on various systems, with performance benchmarks indicating valid outputs and minor errors.

Read original article

DeepSeek-R1 has emerged as a competitive open-source alternative to OpenAI's O1 reasoning model, achieving significant size reduction through quantization techniques. The model, originally 720GB, has been compressed to 131GB while maintaining functionality. This was accomplished by selectively quantizing certain layers to higher bit rates, allowing for efficient performance without compromising output quality. The 1.58-bit version can operate with 160GB of VRAM for fast inference, or with 20GB of RAM on a CPU, albeit at slower speeds. Various dynamic quantized versions have been released, with performance benchmarks indicating that the 1.58-bit model produces valid outputs, although some incorrect tokens may occur. The architecture of DeepSeek R1 utilizes a mixture of experts (MoE) approach, which allows for increased parameters without a corresponding increase in computational cost. The model's performance was evaluated through a Flappy Bird game generation task, scoring high on various criteria. The dynamic quantization code has been made available on GitHub, and users can run the model on various systems, including those without GPUs. The blog post provides detailed instructions for downloading and running the model, emphasizing the importance of proper hardware configuration for optimal performance.

- DeepSeek-R1 is an open-source model rivaling OpenAI's O1.

- The model size was reduced from 720GB to 131GB through selective quantization.

- The 1.58-bit version can run on 160GB VRAM or 20GB RAM, with varying performance.

- Performance benchmarks show the model generates valid outputs, with some minor errors.

- Dynamic quantization code is available on GitHub for user implementation.

Official DeepSeek R1 Now on Ollama

DeepSeek has launched its first generation of reasoning models, matching OpenAI's performance across tasks. Available in sizes from 1.5B to 70B parameters, they are MIT licensed for free use.

DeepSeek-R1 and Exploring DeepSeek-R1-Distill-Llama-8B

DeepSeek, a Chinese AI lab, has launched its R1 model and derived models for tasks like math and coding, open-sourced under MIT, with some licensing concerns and known limitations.

Notes on the New Deepseek R1

Deepseek launched the Deepseek-R1 model, an open-source AI using pure reinforcement learning, which is cheaper and faster than OpenAI's o1, showing strong performance but slightly less in complex reasoning tasks.

DeepSeek R1 Runs at 200 Tokens per Second on Raspberry Pi

The Open Source DeepSeek R1 model runs at 200 tokens per second on Raspberry Pi, outperforming some leading models, raising concerns among major AI companies, and is available for local applications.

DeepSeek Outpaced OpenAI at 3% of the Cost

DeepSeek R1 offers performance similar to OpenAI's models at 3%-5% of the cost, utilizing reinforcement learning. Its success may shift enterprise reliance from proprietary AI, raising ethical bias concerns.

4 comments

By @danielhanchen - 3 months

Oh thanks for sharing this! The fork of llama.cpp for how to do the dynamic quant is here: https://github.com/unslothai/llama.cpp. I also found min_p = 0.05 can help reduce chances of some bad tokens coming up for 1.58bit (I found it to happen around 1/8000 tokens of the time)

By @homarp - 3 months

discussed here https://news.ycombinator.com/item?id=42850222

By @homarp - 3 months

"The 1.58bit quantization should fit in 160GB of VRAM for fast inference"

instruction for llama.cpp: https://huggingface.co/unsloth/DeepSeek-R1-GGUF#instructions...

Official DeepSeek R1 Now on Ollama

DeepSeek has launched its first generation of reasoning models, matching OpenAI's performance across tasks. Available in sizes from 1.5B to 70B parameters, they are MIT licensed for free use.

DeepSeek-R1 and Exploring DeepSeek-R1-Distill-Llama-8B

DeepSeek, a Chinese AI lab, has launched its R1 model and derived models for tasks like math and coding, open-sourced under MIT, with some licensing concerns and known limitations.

Notes on the New Deepseek R1

DeepSeek R1 Runs at 200 Tokens per Second on Raspberry Pi

The Open Source DeepSeek R1 model runs at 200 tokens per second on Raspberry Pi, outperforming some leading models, raising concerns among major AI companies, and is available for local applications.

Run DeepSeek R1 Dynamic 1.58-bit

Related

Official DeepSeek R1 Now on Ollama

DeepSeek-R1 and Exploring DeepSeek-R1-Distill-Llama-8B

Notes on the New Deepseek R1

DeepSeek R1 Runs at 200 Tokens per Second on Raspberry Pi

DeepSeek Outpaced OpenAI at 3% of the Cost

Related

Official DeepSeek R1 Now on Ollama

DeepSeek-R1 and Exploring DeepSeek-R1-Distill-Llama-8B

Notes on the New Deepseek R1

DeepSeek R1 Runs at 200 Tokens per Second on Raspberry Pi

DeepSeek Outpaced OpenAI at 3% of the Cost