Qwen2.5: A Party of Foundation Models
Qwen has released Qwen2.5, a major update featuring specialized models for coding and mathematics, pretrained on 18 trillion tokens, supporting long text generation and multilingual capabilities across 29 languages.
Read original articleQwen has announced the release of Qwen2.5, a significant update to its language model family, which includes specialized models for coding and mathematics. This release is touted as one of the largest open-source releases in history, featuring models of various sizes, including Qwen2.5, Qwen2.5-Coder, and Qwen2.5-Math. The models are pretrained on a large-scale dataset of up to 18 trillion tokens, resulting in improved performance in knowledge acquisition, coding, and mathematical reasoning. Qwen2.5 models support long text generation, structured data comprehension, and multilingual capabilities across 29 languages. The specialized models, Qwen2.5-Coder and Qwen2.5-Math, have been enhanced to deliver competitive performance in coding tasks and mathematical reasoning, respectively. Benchmarking results indicate that Qwen2.5-72B outperforms many leading models, while Qwen2.5-Coder excels in coding applications despite its smaller size. The release also includes APIs for flagship models and emphasizes the importance of community collaboration in advancing open-source AI. Future developments will focus on integrating different modalities and enhancing reasoning capabilities.
- Qwen2.5 is a major update with specialized models for coding and mathematics.
- The models are pretrained on a dataset of 18 trillion tokens, improving knowledge and performance.
- Qwen2.5 supports long text generation and structured data comprehension in 29 languages.
- Benchmarking shows Qwen2.5-72B outperforms many leading models in various tasks.
- Future updates will aim to integrate different modalities and enhance reasoning capabilities.
Related
Large Enough – Mistral AI
Mistral AI released Mistral Large 2, enhancing code generation, reasoning, and multilingual support with 123 billion parameters. It outperforms competitors and is available for research use via various cloud platforms.
Coding with Llama 3.1, New DeepSeek Coder and Mistral Large
Five new AI models for code editing have been released, with Claude 3.5 Sonnet leading at 77%. DeepSeek Coder V2 0724 excels in SEARCH/REPLACE operations, outperforming others.
Gemini Pro 1.5 experimental "version 0801" available for early testing
Google DeepMind's Gemini family of AI models, particularly Gemini 1.5 Pro, excels in multimodal understanding and complex tasks, featuring a two million token context window and improved performance in various benchmarks.
Gemma explained: What's new in Gemma 2
Gemma 2 introduces open models in 2B, 9B, and 27B sizes, enhancing conversational AI with innovations like GQA and logit soft-capping, while future developments will explore the RecurrentGemma model.
Pixtral 12B
Mistral AI released Pixtral 12B, its first multimodal model for image and text processing, achieving 52.5% on the MMMU benchmark and supporting variable image sizes in a 128K token context.
70B is just a littttle rough trying to run without offloading some layers to the CPU.
I remember when GPT-3 was trained on 300B tokens.
It is interesting to see what the future will bring when models incorporate chain of thought approaches and whether o1 will get outperformed by open source models.
Related
Large Enough – Mistral AI
Mistral AI released Mistral Large 2, enhancing code generation, reasoning, and multilingual support with 123 billion parameters. It outperforms competitors and is available for research use via various cloud platforms.
Coding with Llama 3.1, New DeepSeek Coder and Mistral Large
Five new AI models for code editing have been released, with Claude 3.5 Sonnet leading at 77%. DeepSeek Coder V2 0724 excels in SEARCH/REPLACE operations, outperforming others.
Gemini Pro 1.5 experimental "version 0801" available for early testing
Google DeepMind's Gemini family of AI models, particularly Gemini 1.5 Pro, excels in multimodal understanding and complex tasks, featuring a two million token context window and improved performance in various benchmarks.
Gemma explained: What's new in Gemma 2
Gemma 2 introduces open models in 2B, 9B, and 27B sizes, enhancing conversational AI with innovations like GQA and logit soft-capping, while future developments will explore the RecurrentGemma model.
Pixtral 12B
Mistral AI released Pixtral 12B, its first multimodal model for image and text processing, achieving 52.5% on the MMMU benchmark and supporting variable image sizes in a 128K token context.