Large Enough – Mistral AI
Mistral AI released Mistral Large 2, enhancing code generation, reasoning, and multilingual support with 123 billion parameters. It outperforms competitors and is available for research use via various cloud platforms.
Read original articleMistral AI has announced the release of Mistral Large 2, an advanced version of its flagship model, which significantly enhances capabilities in code generation, mathematics, reasoning, and multilingual support. This model features a 128k context window and supports numerous languages, including major European and Asian languages, as well as over 80 programming languages. Mistral Large 2 is designed for efficient single-node inference, boasting 123 billion parameters for high throughput.
The model demonstrates improved performance metrics, achieving an accuracy of 84.0% on the MMLU benchmark and outperforming its predecessor and competing models like GPT-4o and Claude 3 Opus. Key enhancements include reduced "hallucination" tendencies, better reasoning skills, and improved instruction-following capabilities, making it adept at handling complex queries and multi-turn conversations.
Mistral Large 2 is available under the Mistral Research License for research and non-commercial use. It is accessible via la Plateforme and can be tested through the API named mistral-large-2407. The model is part of a broader consolidation of Mistral's offerings, which includes general-purpose and specialist models. Additionally, Mistral AI has expanded partnerships with major cloud service providers, making its models available on platforms like Google Cloud, Azure, Amazon Bedrock, and IBM Watson. Fine-tuning capabilities for Mistral Large, Mistral Nemo, and Codestral have also been introduced on la Plateforme.
Related
Codestral Mamba
Codestral Mamba, a new Mamba2 language model by Mistral AI, excels in code generation with linear time inference and infinite sequence modeling. It rivals transformer models, supports 256k tokens, and aids local code assistance. Deployable via mistral-inference SDK or TensorRT-LLM, it's open-source under Apache 2.0.
Mistral NeMo
Mistral AI introduces Mistral NeMo, a powerful 12B model developed with NVIDIA. It features a large context window, strong reasoning abilities, and FP8 inference support. Available under Apache 2.0 license for diverse applications.
Mathstral: 7B LLM designed for math reasoning and scientific discovery
MathΣtral, a new 7B model by Mistral AI, focuses on math reasoning and scientific discovery, inspired by Archimedes and Newton. It excels in STEM with high reasoning abilities, scoring 56.6% on MATH and 63.47% on MMLU. The model's release under Apache 2.0 license supports academic projects, showcasing performance/speed tradeoffs in specialized models. Further enhancements can be achieved through increased inference-time computation. Professor Paul Bourdon's curation of GRE Math Subject Test problems contributed to the model's evaluation. Instructions for model use and fine-tuning are available in the documentation hosted on HuggingFace.
Can the New Mathstral LLM Accurately Compare 9.11 and 9.9?
Mathstral is a new 7B model by Mistral AI for math reasoning, with a 32k context window and Apache 2.0 license. It aims to improve common sense in math problem-solving, deployable locally with LlamaEdge and shareable via GaiaNet for customization and integration.
Run Mistral 7B model using less than 4GB of memory on your Mac with CoreML
Apple introduced Apple Intelligence at WWDC 24, highlighting Core ML's efficiency on Apple Silicon hardware. New features enable running large language models like Mistral 7B on Mac devices with reduced memory usage.
Large 2 - https://chat.mistral.ai/chat
Llama 3.1 405b - https://www.llama2.ai/
I just tested Mistral Large 2 and Llama 3.1 405b on 5 prompts from my Claude history.
I'd rank as:
1. Sonnet 3.5
2. Large 2 and Llama 405b (similar, no clear winner between the two)
If you're using Claude, stick with it.
My Claude wishlist:
1. Smarter (yes, it's the most intelligent, and yes, I wish it was far smarter still)
2. Longer context window (1M+)
3. Native audio input including tone understanding
4. Fewer refusals and less moralizing when refusing
5. Faster
6. More tokens in output
My experience (benchmarks aside) Claude 3.5 Sonnet absolutely blows everything away.
I'm not really sure how to even test/use Mistral or Llama for everyday use though.
It seems to be competitive with Llama 3.1 405b but with a much more restrictive license.
Given how the difference between these models is shrinking, I think you're better off using llama 405B to finetune the 70B on the specific use case.
This would be different if it was a major leap in quality, but it doesn't seem to be.
Very glad that there's a lot of competition at the top, though!
One thing that `exponentialists` forget is that each step also requires exponentially more energy and resources.
We need to figure out how to measure intelligence that is greater than human.
Why does the chart below say the "Function Calling" accuracy is about 50%? Does that mean it fails half the time with complex operations?
AnthropicProvider('claude-3-haiku-20240307') Median Latency: 1.61 | Aggregated speed: 122.50 | Accuracy: 44.44%
MistralProvider('open-mistral-nemo') Median Latency: 1.37 | Aggregated speed: 100.37 | Accuracy: 51.85%
OpenAIProvider('gpt-4o-mini') Median Latency: 2.13 | Aggregated speed: 67.59 | Accuracy: 59.26%
MistralProvider('mistral-large-latest') Median Latency: 10.18 | Aggregated speed: 18.64 | Accuracy: 62.96%
AnthropicProvider('claude-3-5-sonnet-20240620') Median Latency: 3.61 | Aggregated speed: 59.70 | Accuracy: 62.96%
OpenAIProvider('gpt-4o') Median Latency: 3.25 | Aggregated speed: 53.75 | Accuracy: 74.07% |
Apparently Llama 3.1 relied on artificial data, would be very curious about the type of data that Mistral uses.
Besides, parameter redundancy seems evidenced. Front-tier models used to be 1.8T, then 405B, and now 123B. Would front-tier models in the future be <10B or even <1B, that would be a game changer.
Is there a benchmark or something similar that compares this "quality" across different models?
Maybe they are running it on proprietary or semi proprietary hardware but if they dont, how much does the market no where various shipments of NVIDEA processors ends up?
I imagine most intelligence agencies are in need of vast quantities.
I presume is M$ announces new availability of AI compute it means they have received and put into production X Nvidiam, which might make it possible to guesstimate within some bounds how many.
Same with other open market compute facilities.
Is it likely that a significant share of NVIDEA processors are going to government / intelligent / fronts?
It's almost useless because I literally can't use it.
Update: https://support.anthropic.com/en/articles/8325612-does-claud...
45 messages per 5 hours is the limit for Pro users, less if Claude is wordy in its responses—which it always is. I hit that limit so fast when I'm investigating something. So annoying.
They used to let you select another, worse model but I don't see that option anymore. le sigh
I tried Codestral and nothing came close. Not even slightly. It was the only LLM that consistently put out code for me that was runnable and idiomatic.
these benchmarks are as good as random hardware ones apple or intel pushes to sell their stuff. in the real world, most people will end up with some modifications for their specific use case anyways. for those, i argue, we already have "capable enough" models for the job.
Do they mean inference done on a single machine?
Related
Codestral Mamba
Codestral Mamba, a new Mamba2 language model by Mistral AI, excels in code generation with linear time inference and infinite sequence modeling. It rivals transformer models, supports 256k tokens, and aids local code assistance. Deployable via mistral-inference SDK or TensorRT-LLM, it's open-source under Apache 2.0.
Mistral NeMo
Mistral AI introduces Mistral NeMo, a powerful 12B model developed with NVIDIA. It features a large context window, strong reasoning abilities, and FP8 inference support. Available under Apache 2.0 license for diverse applications.
Mathstral: 7B LLM designed for math reasoning and scientific discovery
MathΣtral, a new 7B model by Mistral AI, focuses on math reasoning and scientific discovery, inspired by Archimedes and Newton. It excels in STEM with high reasoning abilities, scoring 56.6% on MATH and 63.47% on MMLU. The model's release under Apache 2.0 license supports academic projects, showcasing performance/speed tradeoffs in specialized models. Further enhancements can be achieved through increased inference-time computation. Professor Paul Bourdon's curation of GRE Math Subject Test problems contributed to the model's evaluation. Instructions for model use and fine-tuning are available in the documentation hosted on HuggingFace.
Can the New Mathstral LLM Accurately Compare 9.11 and 9.9?
Mathstral is a new 7B model by Mistral AI for math reasoning, with a 32k context window and Apache 2.0 license. It aims to improve common sense in math problem-solving, deployable locally with LlamaEdge and shareable via GaiaNet for customization and integration.
Run Mistral 7B model using less than 4GB of memory on your Mac with CoreML
Apple introduced Apple Intelligence at WWDC 24, highlighting Core ML's efficiency on Apple Silicon hardware. New features enable running large language models like Mistral 7B on Mac devices with reduced memory usage.