Codestral Mamba
Codestral Mamba, a new Mamba2 language model by Mistral AI, excels in code generation with linear time inference and infinite sequence modeling. It rivals transformer models, supports 256k tokens, and aids local code assistance. Deployable via mistral-inference SDK or TensorRT-LLM, it's open-source under Apache 2.0.
Read original articleCodestral Mamba is a new Mamba2 language model specialized in code generation, released by Mistral AI as a tribute to Cleopatra. Unlike Transformer models, Mamba models offer linear time inference and the ability to model sequences of infinite length, providing quick responses regardless of input length. The model is designed with advanced code and reasoning capabilities, performing on par with state-of-the-art transformer-based models. Codestral Mamba has been tested on in-context retrieval capabilities up to 256k tokens and is intended to be a useful local code assistant. It can be deployed using the mistral-inference SDK or TensorRT-LLM and is available for download on la Plateforme. Codestral Mamba is released under the Apache 2.0 license, while its counterpart, Codestral 22B, is available under a commercial or community license. It is important to note that Codestral Mamba is an instructed model with 7,285,403,648 parameters, offering new possibilities in architecture research and code productivity use cases.
Related
NuExtract: A LLM for Structured Extraction
NuExtract is a structure extraction model by NuMind, offering tiny and large versions. NuMind also provides NuNER Zero and sentiment analysis models. Mistral 7B, by Mistral AI, excels in benchmarks with innovative attention mechanisms.
BM42 – a new baseline for hybrid search
Qdrant introduces BM42, combining BM25 with embeddings to enhance text retrieval. Addressing SPLADE's limitations, it leverages transformer models for semantic information extraction, promising improved retrieval quality and adaptability across domains.
Tokens are a big reason today's generative AI falls short
Generative AI models encounter limitations with tokenization, affecting performance in tasks like math and language processing. Researchers explore alternatives like MambaByte to address tokenization challenges, aiming to enhance model efficiency and capabilities.
Gemma 2 on AWS Lambda with Llamafile
Google released Gemma 2 9B, a compact language model rivaling GPT-3.5. Mozilla's llamafile simplifies deploying models like LLaVA 1.5 and Mistral 7B Instruct, enhancing accessibility to powerful AI models across various systems.
MobileLLM: Optimizing Sub-Billion Parameter Language Models for On-Device Use
The GitHub repository contains MobileLLM code optimized for sub-billion parameter language models for on-device applications. It includes design considerations, code guidelines, outcomes on common sense reasoning tasks, acknowledgements, and licensing details. Contact repository individuals for support.
- Users express a desire for better code completion tools, comparing it to older versions of Copilot.
- There are requests for clearer instructions and easier deployment methods, particularly for VS Code.
- Some comments highlight the need for performance metrics and comparisons with other models like DeepSeek and Gemini.
- Several users seek more information on the Mamba architecture and its advantages over transformer models.
- There is criticism of the article's historical reference to Cleopatra, calling it inaccurate and in poor taste.
If they had linked to the instructions in their post (or better yet a link to a one click install of a VS Code Extension), it would help a lot with adoption.
(BTW I consider it malpractice that they are at the top of hacker news with a model that is of great interest to a large portion of the users where and they do not have a monetizable call to action on the page featured.)
Has anyone tried this? And then, is it fast(er)?
> We have tested Codestral Mamba on in-context retrieval capabilities up to 256k tokens
Why only 256k tokens? Gemini's context window is 1 million or more and it's (probably) not even using Mamba.
This is coming from someone that understands the general concepts of how LLMs work but only used the general publicly available tools like ChatGPT, Claude, etc.
I want to see if I have any hardware I can stress and run something locally, but don’t know where to start or even what are the available options.
You just need a Mistral API key: https://console.mistral.ai/api-keys/
> As a tribute to Cleopatra, whose glorious destiny ended in tragic snake circumstances
but according to Wikipedia this is not true:
> When Cleopatra learned that Octavian planned to bring her to his Roman triumphal procession, she killed herself by poisoning, contrary to the popular belief that she was bitten by an asp.
They're certainly not the first to use Cleopatra this way nor the most egregious, but there are plenty of other random mamba jokes that could have filled in there and both made more sense and been less crass.
Related
NuExtract: A LLM for Structured Extraction
NuExtract is a structure extraction model by NuMind, offering tiny and large versions. NuMind also provides NuNER Zero and sentiment analysis models. Mistral 7B, by Mistral AI, excels in benchmarks with innovative attention mechanisms.
BM42 – a new baseline for hybrid search
Qdrant introduces BM42, combining BM25 with embeddings to enhance text retrieval. Addressing SPLADE's limitations, it leverages transformer models for semantic information extraction, promising improved retrieval quality and adaptability across domains.
Tokens are a big reason today's generative AI falls short
Generative AI models encounter limitations with tokenization, affecting performance in tasks like math and language processing. Researchers explore alternatives like MambaByte to address tokenization challenges, aiming to enhance model efficiency and capabilities.
Gemma 2 on AWS Lambda with Llamafile
Google released Gemma 2 9B, a compact language model rivaling GPT-3.5. Mozilla's llamafile simplifies deploying models like LLaVA 1.5 and Mistral 7B Instruct, enhancing accessibility to powerful AI models across various systems.
MobileLLM: Optimizing Sub-Billion Parameter Language Models for On-Device Use
The GitHub repository contains MobileLLM code optimized for sub-billion parameter language models for on-device applications. It includes design considerations, code guidelines, outcomes on common sense reasoning tasks, acknowledgements, and licensing details. Contact repository individuals for support.