July 25th, 2024

Coding with Llama 3.1, New DeepSeek Coder and Mistral Large

Five new AI models for code editing have been released, with Claude 3.5 Sonnet leading at 77%. DeepSeek Coder V2 0724 excels in SEARCH/REPLACE operations, outperforming others.

Read original articleLink Icon
Coding with Llama 3.1, New DeepSeek Coder and Mistral Large

Recently, five new AI models have been released, showcasing various code editing capabilities. The results from aider's code editing leaderboard indicate that Claude 3.5 Sonnet leads with a score of 77%, followed closely by DeepSeek Coder V2 0724 at 73%. The Llama 3.1 405B instruct model scored 66%, while Mistral Large 2 (2407) achieved 60%. The Llama 3.1 70B model scored 59%, and the 8B variant lagged significantly at 38%.

DeepSeek Coder V2 0724 stands out as the most surprising and effective model, particularly due to its enhanced ability to perform SEARCH/REPLACE operations, allowing for efficient editing of large files. In contrast, the Llama 3.1 models, while performing well in evaluations, showed limitations in their ability to use SEARCH/REPLACE effectively, particularly the smaller 70B and 8B models, which are restricted to editing smaller files due to output token limits.

Mistral Large 2 (2407) also struggled with SEARCH/REPLACE functionality, limiting its application to small source files. Overall, while the new models demonstrate promising capabilities, their effectiveness varies significantly, with DeepSeek Coder V2 0724 emerging as a strong contender in the code editing space.

Related

Run the strongest open-source LLM model: Llama3 70B with just a single 4GB GPU

Run the strongest open-source LLM model: Llama3 70B with just a single 4GB GPU

The article discusses the release of open-source Llama3 70B model, highlighting its performance compared to GPT-4 and Claude3 Opus. It emphasizes training enhancements, data quality, and the competition between open and closed-source models.

Claude 3.5 Sonnet

Claude 3.5 Sonnet

Anthropic introduces Claude Sonnet 3.5, a fast and cost-effective large language model with new features like Artifacts. Human tests show significant improvements. Privacy and safety evaluations are conducted. Claude 3.5 Sonnet's impact on engineering and coding capabilities is explored, along with recursive self-improvement in AI development.

Llama 3.1 Official Launch

Llama 3.1 Official Launch

Llama introduces Llama 3.1, an open-source AI model available in 8B, 70B, and 405B versions. The 405B model is highlighted for its versatility in supporting various use cases, including multi-lingual agents and analyzing large documents. Users can leverage coding assistants, real-time or batch inference, and fine-tuning capabilities. Llama emphasizes open-source AI and offers subscribers updates via a newsletter.

Llama 3.1: Our most capable models to date

Llama 3.1: Our most capable models to date

Meta has launched Llama 3.1 405B, an advanced open-source AI model supporting diverse languages and extended context length. It introduces new features like Llama Guard 3 and aims to enhance AI applications with improved models and partnerships.

Meta Llama 3.1 405B

Meta Llama 3.1 405B

The Meta AI team unveils Llama 3.1, a 405B model optimized for dialogue applications. It competes well with GPT-4o and Claude 3.5 Sonnet, offering versatility and strong performance in evaluations.

Link Icon 2 comments
By @anotherpaulg - 3 months
Five noteworthy models have been released in the last few days, with a wide range of code editing capabilities. Here are their results from aider’s code editing leaderboard with Sonnet and GPT-3.5 included for scale.

  77% claude-3.5-sonnet
  73% DeepSeek Coder V2 0724
  66% llama-3.1-405b-instruct
  60% Mistral Large 2 (2407)
  59% llama-3.1-70b-instruct
  58% gpt-3.5-turbo-0301
  38% llama-3.1-8b-instruct
By @Workaccount2 - 3 months
Is there a reason people seem to exclude google models? I see lots of talk about llama, mistral, deepseek, and then ofc gpt and claude, but not much about gemma and gemini despite them benchmarking well.