Pixtral 12B
Mistral AI released Pixtral 12B, its first multimodal model for image and text processing, achieving 52.5% on the MMMU benchmark and supporting variable image sizes in a 128K token context.
Read original articleMistral AI has announced the release of Pixtral 12B, its first multimodal model designed to process both images and text. This model features a new 400M parameter vision encoder and a 12B parameter multimodal decoder, allowing it to handle variable image sizes and multiple images within a long context window of 128K tokens. Pixtral 12B excels in multimodal tasks such as document question answering and chart understanding, achieving a score of 52.5% on the MMMU reasoning benchmark, outperforming several larger models. It maintains strong performance on text-only benchmarks, making it a versatile tool for developers. The model is designed to be a drop-in replacement for Mistral Nemo 12B, providing best-in-class multimodal reasoning without sacrificing text capabilities. Pixtral's architecture allows for efficient processing of images at their native resolution, enhancing its ability to understand complex diagrams and documents. The model has been benchmarked against both open and closed models, demonstrating superior performance in instruction following tasks. Pixtral is available for use via La Plateforme and Le Chat, with open-source prompts and evaluation benchmarks to be shared with the community.
- Pixtral 12B is Mistral AI's first multimodal model, integrating image and text processing.
- It achieves a 52.5% score on the MMMU reasoning benchmark, outperforming larger models.
- The model supports variable image sizes and can process multiple images in a 128K token context.
- Pixtral excels in both multimodal and text-only instruction following tasks.
- It is available for use through La Plateforme and Le Chat, with open-source resources planned.
Related
Mistral NeMo
Mistral AI introduces Mistral NeMo, a powerful 12B model developed with NVIDIA. It features a large context window, strong reasoning abilities, and FP8 inference support. Available under Apache 2.0 license for diverse applications.
Mathstral: 7B LLM designed for math reasoning and scientific discovery
MathΣtral, a new 7B model by Mistral AI, focuses on math reasoning and scientific discovery, inspired by Archimedes and Newton. It excels in STEM with high reasoning abilities, scoring 56.6% on MATH and 63.47% on MMLU. The model's release under Apache 2.0 license supports academic projects, showcasing performance/speed tradeoffs in specialized models. Further enhancements can be achieved through increased inference-time computation. Professor Paul Bourdon's curation of GRE Math Subject Test problems contributed to the model's evaluation. Instructions for model use and fine-tuning are available in the documentation hosted on HuggingFace.
Large Enough – Mistral AI
Mistral AI released Mistral Large 2, enhancing code generation, reasoning, and multilingual support with 123 billion parameters. It outperforms competitors and is available for research use via various cloud platforms.
Mistral Agents
Mistral AI has improved model customization for its flagship models, introduced "Agents" for custom workflows, and released a stable SDK version, enhancing accessibility and effectiveness of generative AI for developers.
Mistral releases Pixtral 12B, its first multimodal model
French AI startup Mistral launched Pixtral 12B, a multimodal model for processing images and text, featuring 12 billion parameters. The model is available under an Apache 2.0 license.
Asking these questions as a genuine fan of this company—I really want to believe they can succeed and not go the way of StabilityAI.
Pixtral 12B MMLU=69.2
Looking at images made it smarter...
Related
Mistral NeMo
Mistral AI introduces Mistral NeMo, a powerful 12B model developed with NVIDIA. It features a large context window, strong reasoning abilities, and FP8 inference support. Available under Apache 2.0 license for diverse applications.
Mathstral: 7B LLM designed for math reasoning and scientific discovery
MathΣtral, a new 7B model by Mistral AI, focuses on math reasoning and scientific discovery, inspired by Archimedes and Newton. It excels in STEM with high reasoning abilities, scoring 56.6% on MATH and 63.47% on MMLU. The model's release under Apache 2.0 license supports academic projects, showcasing performance/speed tradeoffs in specialized models. Further enhancements can be achieved through increased inference-time computation. Professor Paul Bourdon's curation of GRE Math Subject Test problems contributed to the model's evaluation. Instructions for model use and fine-tuning are available in the documentation hosted on HuggingFace.
Large Enough – Mistral AI
Mistral AI released Mistral Large 2, enhancing code generation, reasoning, and multilingual support with 123 billion parameters. It outperforms competitors and is available for research use via various cloud platforms.
Mistral Agents
Mistral AI has improved model customization for its flagship models, introduced "Agents" for custom workflows, and released a stable SDK version, enhancing accessibility and effectiveness of generative AI for developers.
Mistral releases Pixtral 12B, its first multimodal model
French AI startup Mistral launched Pixtral 12B, a multimodal model for processing images and text, featuring 12 billion parameters. The model is available under an Apache 2.0 license.