DeepSeek v2.5 – open-source LLM comparable to GPT-4o, but 95% less expensive
DeepSeek launched DeepSeek-V2.5, an advanced open-source model with a 128K context length, excelling in math and coding tasks, and offering competitive API pricing for developers.
Read original articleDeepSeek has launched DeepSeek-V2.5, an advanced model that integrates general and coding capabilities, featuring an upgraded API and web interface. This version boasts a remarkable 128K context length and is available for free access. DeepSeek-V2.5 has achieved top rankings in major large model leaderboards, placing in the top three in AlignBench, surpassing GPT-4, and closely competing with GPT-4-Turbo. It also ranks highly in MT-Bench, rivaling LLaMA3-70B and outperforming Mixtral 8x22B. The model specializes in math, coding, and reasoning tasks, and is open-source, making it accessible for various applications. The pricing for the API is set at $0.14 per million input tokens and $0.28 per million output tokens, positioning it as a cost-effective solution. DeepSeek aims to redefine possibilities in AI with this new release, emphasizing its capabilities in handling complex tasks efficiently.
- DeepSeek-V2.5 integrates general and coding capabilities with a 128K context length.
- It ranks in the top three of AlignBench, outperforming GPT-4.
- The model specializes in math, coding, and reasoning tasks.
- API pricing is competitive at $0.14 per million input tokens and $0.28 per million output tokens.
- DeepSeek is an open-source model, enhancing accessibility for developers.
Related
OpenAI slashes the cost of using its AI with a "mini" model
OpenAI launches GPT-4o mini, a cheaper model enhancing AI accessibility. Meta to release Llama 3. Market sees a mix of small and large models for cost-effective AI solutions.
Coding with Llama 3.1, New DeepSeek Coder and Mistral Large
Five new AI models for code editing have been released, with Claude 3.5 Sonnet leading at 77%. DeepSeek Coder V2 0724 excels in SEARCH/REPLACE operations, outperforming others.
Gemini Pro 1.5 experimental "version 0801" available for early testing
Google DeepMind's Gemini family of AI models, particularly Gemini 1.5 Pro, excels in multimodal understanding and complex tasks, featuring a two million token context window and improved performance in various benchmarks.
Qwen2.5: A Party of Foundation Models
Qwen has released Qwen2.5, a major update featuring specialized models for coding and mathematics, pretrained on 18 trillion tokens, supporting long text generation and multilingual capabilities across 29 languages.
DeepSeek: Advancing theorem proving in LLMs through large-scale synthetic data
The paper presents a method to enhance theorem proving in large language models by generating synthetic proof data. The DeepSeekMath 7B model outperformed GPT-4, proving five benchmark problems.
That said the conclusion that it's a good model for cheap is true. I just would be hesitant to say it's a great model.
Here's an Aider leaderboard with the interesting models included: https://aider.chat/docs/leaderboards/ Strangely, v2.5 is below the old v2 Coder. Maybe we can count on v2.5 Coder being released then?
With TikTok, concerns arose partly because of its reach and the vast amount of personal information it collects. An LLM like DeepSeek would arguably have even more potential to gather sensitive data, especially as these models can learn from and remember interaction patterns, potentially accessing or “training” on sensitive information users might input without thinking.
The challenge is that we’re not yet certain how much data DeepSeek would retain and where it would be stored. For countries already wary of data leaving their borders or being accessible to foreign governments, we could see restrictions or monitoring mechanisms placed on similar LLMs—especially if companies start using these models in environments where proprietary information is involved.
In short, if DeepSeek or similar Chinese LLMs gain traction, it’s quite likely they’ll face the same level of scrutiny (or more) that we’ve seen with apps like TikTok.
DeepSeek-V2.5 is an upgraded version that combines DeepSeek-V2-Chat and DeepSeek-Coder-V2-Instruct.
A word of advice on advertising low-cost alternatives.
'The weaknesses make your low cost believable. [..] If you launched Ryan Air and you said we are as good as British Airways but we are half the price, people would go "it does not make sense"'
just a personal benchmark I follow, the UX on locally run stuff has diverged vastly
For the billionth time, there are zero products and services which are NOT in competition with general intelligence. Therefore, this kind of clause simply begs for malicious compliance…go use something else.
Related
OpenAI slashes the cost of using its AI with a "mini" model
OpenAI launches GPT-4o mini, a cheaper model enhancing AI accessibility. Meta to release Llama 3. Market sees a mix of small and large models for cost-effective AI solutions.
Coding with Llama 3.1, New DeepSeek Coder and Mistral Large
Five new AI models for code editing have been released, with Claude 3.5 Sonnet leading at 77%. DeepSeek Coder V2 0724 excels in SEARCH/REPLACE operations, outperforming others.
Gemini Pro 1.5 experimental "version 0801" available for early testing
Google DeepMind's Gemini family of AI models, particularly Gemini 1.5 Pro, excels in multimodal understanding and complex tasks, featuring a two million token context window and improved performance in various benchmarks.
Qwen2.5: A Party of Foundation Models
Qwen has released Qwen2.5, a major update featuring specialized models for coding and mathematics, pretrained on 18 trillion tokens, supporting long text generation and multilingual capabilities across 29 languages.
DeepSeek: Advancing theorem proving in LLMs through large-scale synthetic data
The paper presents a method to enhance theorem proving in large language models by generating synthetic proof data. The DeepSeekMath 7B model outperformed GPT-4, proving five benchmark problems.