September 18th, 2024

Qwen2.5: A Party of Foundation Models

Qwen has released Qwen2.5, a major update featuring specialized models for coding and mathematics, pretrained on 18 trillion tokens, supporting long text generation and multilingual capabilities across 29 languages.

Read original articleLink Icon
Qwen2.5: A Party of Foundation Models

Qwen has announced the release of Qwen2.5, a significant update to its language model family, which includes specialized models for coding and mathematics. This release is touted as one of the largest open-source releases in history, featuring models of various sizes, including Qwen2.5, Qwen2.5-Coder, and Qwen2.5-Math. The models are pretrained on a large-scale dataset of up to 18 trillion tokens, resulting in improved performance in knowledge acquisition, coding, and mathematical reasoning. Qwen2.5 models support long text generation, structured data comprehension, and multilingual capabilities across 29 languages. The specialized models, Qwen2.5-Coder and Qwen2.5-Math, have been enhanced to deliver competitive performance in coding tasks and mathematical reasoning, respectively. Benchmarking results indicate that Qwen2.5-72B outperforms many leading models, while Qwen2.5-Coder excels in coding applications despite its smaller size. The release also includes APIs for flagship models and emphasizes the importance of community collaboration in advancing open-source AI. Future developments will focus on integrating different modalities and enhancing reasoning capabilities.

- Qwen2.5 is a major update with specialized models for coding and mathematics.

- The models are pretrained on a dataset of 18 trillion tokens, improving knowledge and performance.

- Qwen2.5 supports long text generation and structured data comprehension in 29 languages.

- Benchmarking shows Qwen2.5-72B outperforms many leading models in various tasks.

- Future updates will aim to integrate different modalities and enhance reasoning capabilities.

Link Icon 9 comments
By @jcoc611 - 5 months
Probably an ignorant question, but could someone explain why the Context Length is much larger than the Generation Length?
By @freeqaz - 5 months
32B is a nice size for 2x 3090s. That comfortably fits on the GPU with minimal quantization and still leaves extra memory for the long context length.

70B is just a littttle rough trying to run without offloading some layers to the CPU.

By @Flux159 - 5 months
It would be nice to have comparisons to Claude 3.5 for the coder model, only comparing to open source models isn’t super helpful because I would want to compare to the model I’m currently using for development work.
By @ekojs - 5 months
Actually really impressive. They went up from 7T tokens to 18T tokens. Curious to see how they perform after finetuning.
By @GaggiX - 5 months
>our latest large-scale dataset, encompassing up to 18 trillion tokens

I remember when GPT-3 was trained on 300B tokens.

By @irthomasthomas - 5 months
I'm impressed by the scope of this drop. The raw intelligence of open models seems to be falling behind closed. But I think that's because frontier models from openai and anthropic are not just raw models, but probably include stuff like COT, 'best of N', or control vectors.
By @cateye - 5 months
> we are inspired by the recent advancements in reinforcement learning (e.g., o1)

It is interesting to see what the future will bring when models incorporate chain of thought approaches and whether o1 will get outperformed by open source models.