Mistral releases Pixtral 12B, its first multimodal model
French AI startup Mistral launched Pixtral 12B, a multimodal model for processing images and text, featuring 12 billion parameters. The model is available under an Apache 2.0 license.
Read original articleFrench AI startup Mistral has launched Pixtral 12B, its first multimodal model capable of processing both images and text. This model, which contains 12 billion parameters and is approximately 24GB in size, is built on Mistral's existing text model, Nemo 12B. Pixtral 12B can answer questions related to images of various sizes and formats, including those provided via URLs or base64 encoding. It is designed to perform tasks such as image captioning and object counting, similar to other multimodal models like OpenAI's GPT-4o. The model is available for download on GitHub and Hugging Face under an Apache 2.0 license, allowing unrestricted use and fine-tuning. Mistral's recent $645 million funding round, led by General Catalyst, has valued the company at $6 billion, positioning it as a significant player in the AI landscape, particularly in Europe. The company aims to provide open models while also offering managed versions and consulting services. However, details regarding the image data used for training Pixtral 12B remain unclear, raising questions about copyright and data sourcing practices in the AI industry.
- Mistral's Pixtral 12B is its first multimodal AI model for processing images and text.
- The model features 12 billion parameters and is available for unrestricted use under an Apache 2.0 license.
- Pixtral 12B can perform tasks like image captioning and object counting.
- Mistral recently raised $645 million, valuing the company at $6 billion.
- Concerns about copyright and data sourcing practices in AI training remain unresolved.
Related
Mistral NeMo
Mistral AI introduces Mistral NeMo, a powerful 12B model developed with NVIDIA. It features a large context window, strong reasoning abilities, and FP8 inference support. Available under Apache 2.0 license for diverse applications.
Mathstral: 7B LLM designed for math reasoning and scientific discovery
MathΣtral, a new 7B model by Mistral AI, focuses on math reasoning and scientific discovery, inspired by Archimedes and Newton. It excels in STEM with high reasoning abilities, scoring 56.6% on MATH and 63.47% on MMLU. The model's release under Apache 2.0 license supports academic projects, showcasing performance/speed tradeoffs in specialized models. Further enhancements can be achieved through increased inference-time computation. Professor Paul Bourdon's curation of GRE Math Subject Test problems contributed to the model's evaluation. Instructions for model use and fine-tuning are available in the documentation hosted on HuggingFace.
Can the New Mathstral LLM Accurately Compare 9.11 and 9.9?
Mathstral is a new 7B model by Mistral AI for math reasoning, with a 32k context window and Apache 2.0 license. It aims to improve common sense in math problem-solving, deployable locally with LlamaEdge and shareable via GaiaNet for customization and integration.
Large Enough – Mistral AI
Mistral AI released Mistral Large 2, enhancing code generation, reasoning, and multilingual support with 123 billion parameters. It outperforms competitors and is available for research use via various cloud platforms.
Mistral Agents
Mistral AI has improved model customization for its flagship models, introduced "Agents" for custom workflows, and released a stable SDK version, enhancing accessibility and effectiveness of generative AI for developers.
> It’s unclear which image data Mistral might have used to develop Pixtral 12B.
The days of free web scraping especially for the richer sources of material are almost gone, with anything between technical (API restrictions) and legal (copyright) measures building deep moats. I also wonder what they trained it on. They're not Meta or Google with endless supplies of user content, or exclusive contracts with the Reddits of the internet.
1. This is a VLM, not a text-to-image model. You can give it images, and it can understand them. It doesn't generate images back.
2. It seems like Pixtral 12B benchmarks significantly below Qwen2-VL-7B [1], so if you want the best local model for understanding images, probably use Qwen2. If you want a large open-source model, Qwen2-VL-72B is most likely the best option.
New Mistral AI Weights
Also, can your model of choice understand your requests to include/omit particular nuances of an image?
Like writing on an ePaper tablet, exporting the PDF and feed this into this model to extract todos from notes for example.
Or what would be the SotA for this application?
Related
Mistral NeMo
Mistral AI introduces Mistral NeMo, a powerful 12B model developed with NVIDIA. It features a large context window, strong reasoning abilities, and FP8 inference support. Available under Apache 2.0 license for diverse applications.
Mathstral: 7B LLM designed for math reasoning and scientific discovery
MathΣtral, a new 7B model by Mistral AI, focuses on math reasoning and scientific discovery, inspired by Archimedes and Newton. It excels in STEM with high reasoning abilities, scoring 56.6% on MATH and 63.47% on MMLU. The model's release under Apache 2.0 license supports academic projects, showcasing performance/speed tradeoffs in specialized models. Further enhancements can be achieved through increased inference-time computation. Professor Paul Bourdon's curation of GRE Math Subject Test problems contributed to the model's evaluation. Instructions for model use and fine-tuning are available in the documentation hosted on HuggingFace.
Can the New Mathstral LLM Accurately Compare 9.11 and 9.9?
Mathstral is a new 7B model by Mistral AI for math reasoning, with a 32k context window and Apache 2.0 license. It aims to improve common sense in math problem-solving, deployable locally with LlamaEdge and shareable via GaiaNet for customization and integration.
Large Enough – Mistral AI
Mistral AI released Mistral Large 2, enhancing code generation, reasoning, and multilingual support with 123 billion parameters. It outperforms competitors and is available for research use via various cloud platforms.
Mistral Agents
Mistral AI has improved model customization for its flagship models, introduced "Agents" for custom workflows, and released a stable SDK version, enhancing accessibility and effectiveness of generative AI for developers.