Mathstral: 7B LLM designed for math reasoning and scientific discovery
MathΣtral, a new 7B model by Mistral AI, focuses on math reasoning and scientific discovery, inspired by Archimedes and Newton. It excels in STEM with high reasoning abilities, scoring 56.6% on MATH and 63.47% on MMLU. The model's release under Apache 2.0 license supports academic projects, showcasing performance/speed tradeoffs in specialized models. Further enhancements can be achieved through increased inference-time computation. Professor Paul Bourdon's curation of GRE Math Subject Test problems contributed to the model's evaluation. Instructions for model use and fine-tuning are available in the documentation hosted on HuggingFace.
Read original articleMathΣtral, a new model released by Mistral AI, is designed for math reasoning and scientific discovery, paying tribute to Archimedes. This 7B model with a 32k context window is published under the Apache 2.0 license. The release aims to support academic projects and complex mathematical problems, similar to Isaac Newton's contributions. MathΣtral, standing on Mistral 7B's foundation, excels in STEM subjects and achieves high reasoning capacities across industry benchmarks. Notably, it scores 56.6% on MATH and 63.47% on MMLU. The model showcases the performance/speed tradeoffs in specialized models, emphasizing Mistral AI's development philosophy. MathΣtral's performance can be further enhanced with increased inference-time computation. Users are encouraged to refer to the documentation for instructions on utilizing or fine-tuning the model, with weights hosted on HuggingFace. The model's evaluation benefited from Professor Paul Bourdon's curation of GRE Math Subject Test problems.
Related
NuExtract: A LLM for Structured Extraction
NuExtract is a structure extraction model by NuMind, offering tiny and large versions. NuMind also provides NuNER Zero and sentiment analysis models. Mistral 7B, by Mistral AI, excels in benchmarks with innovative attention mechanisms.
My finetuned models beat OpenAI's GPT-4
Alex Strick van Linschoten discusses his finetuned models Mistral, Llama3, and Solar LLMs outperforming OpenAI's GPT-4 in accuracy. He emphasizes challenges in evaluation, model complexities, and tailored prompts' importance.
MASt3R – Matching and Stereo 3D Reconstruction
MASt3R, a model within the DUSt3R framework, excels in 3D reconstruction and feature mapping for image collections. It enhances depth perception, reduces errors, and revolutionizes spatial awareness across industries.
Codestral Mamba
Codestral Mamba, a new Mamba2 language model by Mistral AI, excels in code generation with linear time inference and infinite sequence modeling. It rivals transformer models, supports 256k tokens, and aids local code assistance. Deployable via mistral-inference SDK or TensorRT-LLM, it's open-source under Apache 2.0.
Mistral NeMo
Mistral AI introduces Mistral NeMo, a powerful 12B model developed with NVIDIA. It features a large context window, strong reasoning abilities, and FP8 inference support. Available under Apache 2.0 license for diverse applications.
Related
NuExtract: A LLM for Structured Extraction
NuExtract is a structure extraction model by NuMind, offering tiny and large versions. NuMind also provides NuNER Zero and sentiment analysis models. Mistral 7B, by Mistral AI, excels in benchmarks with innovative attention mechanisms.
My finetuned models beat OpenAI's GPT-4
Alex Strick van Linschoten discusses his finetuned models Mistral, Llama3, and Solar LLMs outperforming OpenAI's GPT-4 in accuracy. He emphasizes challenges in evaluation, model complexities, and tailored prompts' importance.
MASt3R – Matching and Stereo 3D Reconstruction
MASt3R, a model within the DUSt3R framework, excels in 3D reconstruction and feature mapping for image collections. It enhances depth perception, reduces errors, and revolutionizes spatial awareness across industries.
Codestral Mamba
Codestral Mamba, a new Mamba2 language model by Mistral AI, excels in code generation with linear time inference and infinite sequence modeling. It rivals transformer models, supports 256k tokens, and aids local code assistance. Deployable via mistral-inference SDK or TensorRT-LLM, it's open-source under Apache 2.0.
Mistral NeMo
Mistral AI introduces Mistral NeMo, a powerful 12B model developed with NVIDIA. It features a large context window, strong reasoning abilities, and FP8 inference support. Available under Apache 2.0 license for diverse applications.