September 18th, 2024

Qwen2.5: A Party of Foundation Models

Qwen has released Qwen2.5, a major update featuring specialized models for coding and mathematics, pretrained on 18 trillion tokens, supporting long text generation and multilingual capabilities across 29 languages.

Read original article

Qwen has announced the release of Qwen2.5, a significant update to its language model family, which includes specialized models for coding and mathematics. This release is touted as one of the largest open-source releases in history, featuring models of various sizes, including Qwen2.5, Qwen2.5-Coder, and Qwen2.5-Math. The models are pretrained on a large-scale dataset of up to 18 trillion tokens, resulting in improved performance in knowledge acquisition, coding, and mathematical reasoning. Qwen2.5 models support long text generation, structured data comprehension, and multilingual capabilities across 29 languages. The specialized models, Qwen2.5-Coder and Qwen2.5-Math, have been enhanced to deliver competitive performance in coding tasks and mathematical reasoning, respectively. Benchmarking results indicate that Qwen2.5-72B outperforms many leading models, while Qwen2.5-Coder excels in coding applications despite its smaller size. The release also includes APIs for flagship models and emphasizes the importance of community collaboration in advancing open-source AI. Future developments will focus on integrating different modalities and enhancing reasoning capabilities.

- Qwen2.5 is a major update with specialized models for coding and mathematics.

- The models are pretrained on a dataset of 18 trillion tokens, improving knowledge and performance.

- Qwen2.5 supports long text generation and structured data comprehension in 29 languages.

- Benchmarking shows Qwen2.5-72B outperforms many leading models in various tasks.

- Future updates will aim to integrate different modalities and enhance reasoning capabilities.

Large Enough – Mistral AI

Mistral AI released Mistral Large 2, enhancing code generation, reasoning, and multilingual support with 123 billion parameters. It outperforms competitors and is available for research use via various cloud platforms.

Coding with Llama 3.1, New DeepSeek Coder and Mistral Large

Five new AI models for code editing have been released, with Claude 3.5 Sonnet leading at 77%. DeepSeek Coder V2 0724 excels in SEARCH/REPLACE operations, outperforming others.

Gemini Pro 1.5 experimental "version 0801" available for early testing

Google DeepMind's Gemini family of AI models, particularly Gemini 1.5 Pro, excels in multimodal understanding and complex tasks, featuring a two million token context window and improved performance in various benchmarks.

Gemma explained: What's new in Gemma 2

Gemma 2 introduces open models in 2B, 9B, and 27B sizes, enhancing conversational AI with innovations like GQA and logit soft-capping, while future developments will explore the RecurrentGemma model.

Pixtral 12B

Mistral AI released Pixtral 12B, its first multimodal model for image and text processing, achieving 52.5% on the MMMU benchmark and supporting variable image sizes in a 128K token context.

9 comments

By @jcoc611 - 8 months

Probably an ignorant question, but could someone explain why the Context Length is much larger than the Generation Length?

By @freeqaz - 8 months

32B is a nice size for 2x 3090s. That comfortably fits on the GPU with minimal quantization and still leaves extra memory for the long context length.

70B is just a littttle rough trying to run without offloading some layers to the CPU.

By @Flux159 - 8 months

It would be nice to have comparisons to Claude 3.5 for the coder model, only comparing to open source models isn’t super helpful because I would want to compare to the model I’m currently using for development work.

By @ekojs - 8 months

Actually really impressive. They went up from 7T tokens to 18T tokens. Curious to see how they perform after finetuning.

By @GaggiX - 8 months

>our latest large-scale dataset, encompassing up to 18 trillion tokens

I remember when GPT-3 was trained on 300B tokens.

By @irthomasthomas - 8 months

I'm impressed by the scope of this drop. The raw intelligence of open models seems to be falling behind closed. But I think that's because frontier models from openai and anthropic are not just raw models, but probably include stuff like COT, 'best of N', or control vectors.

By @cateye - 8 months

> we are inspired by the recent advancements in reinforcement learning (e.g., o1)

It is interesting to see what the future will bring when models incorporate chain of thought approaches and whether o1 will get outperformed by open source models.