July 16th, 2024

Codestral Mamba

Codestral Mamba, a new Mamba2 language model by Mistral AI, excels in code generation with linear time inference and infinite sequence modeling. It rivals transformer models, supports 256k tokens, and aids local code assistance. Deployable via mistral-inference SDK or TensorRT-LLM, it's open-source under Apache 2.0.

Read original article

QuestionsConcernsSuggestions

Codestral Mamba is a new Mamba2 language model specialized in code generation, released by Mistral AI as a tribute to Cleopatra. Unlike Transformer models, Mamba models offer linear time inference and the ability to model sequences of infinite length, providing quick responses regardless of input length. The model is designed with advanced code and reasoning capabilities, performing on par with state-of-the-art transformer-based models. Codestral Mamba has been tested on in-context retrieval capabilities up to 256k tokens and is intended to be a useful local code assistant. It can be deployed using the mistral-inference SDK or TensorRT-LLM and is available for download on la Plateforme. Codestral Mamba is released under the Apache 2.0 license, while its counterpart, Codestral 22B, is available under a commercial or community license. It is important to note that Codestral Mamba is an instructed model with 7,285,403,648 parameters, offering new possibilities in architecture research and code productivity use cases.

NuExtract: A LLM for Structured Extraction

NuExtract is a structure extraction model by NuMind, offering tiny and large versions. NuMind also provides NuNER Zero and sentiment analysis models. Mistral 7B, by Mistral AI, excels in benchmarks with innovative attention mechanisms.

BM42 – a new baseline for hybrid search

Qdrant introduces BM42, combining BM25 with embeddings to enhance text retrieval. Addressing SPLADE's limitations, it leverages transformer models for semantic information extraction, promising improved retrieval quality and adaptability across domains.

Tokens are a big reason today's generative AI falls short

Generative AI models encounter limitations with tokenization, affecting performance in tasks like math and language processing. Researchers explore alternatives like MambaByte to address tokenization challenges, aiming to enhance model efficiency and capabilities.

Gemma 2 on AWS Lambda with Llamafile

Google released Gemma 2 9B, a compact language model rivaling GPT-3.5. Mozilla's llamafile simplifies deploying models like LLaVA 1.5 and Mistral 7B Instruct, enhancing accessibility to powerful AI models across various systems.

MobileLLM: Optimizing Sub-Billion Parameter Language Models for On-Device Use

The GitHub repository contains MobileLLM code optimized for sub-billion parameter language models for on-device applications. It includes design considerations, code guidelines, outcomes on common sense reasoning tasks, acknowledgements, and licensing details. Contact repository individuals for support.

AI: What people are saying

The article on Codestral Mamba, a new Mamba2 language model by Mistral AI, has sparked various discussions.

Users express a desire for better code completion tools, comparing it to older versions of Copilot.
There are requests for clearer instructions and easier deployment methods, particularly for VS Code.
Some comments highlight the need for performance metrics and comparisons with other models like DeepSeek and Gemini.
Several users seek more information on the Mamba architecture and its advantages over transformer models.
There is criticism of the article's historical reference to Cleopatra, calling it inaccurate and in poor taste.

23 comments

By @bhouston - 9 months

What are the steps required to get this running in VS Code?

If they had linked to the instructions in their post (or better yet a link to a one click install of a VS Code Extension), it would help a lot with adoption.

(BTW I consider it malpractice that they are at the top of hacker news with a model that is of great interest to a large portion of the users where and they do not have a monetizable call to action on the page featured.)

By @solarkraft - 9 months

I kinda just want something that can keep up with the original version of Copilot. It was so much better than the crap they’re pumping out now (keeps messing up syntax and only completing a few characters at a time).

By @thot_experiment - 9 months

Does anyone have a favorite FIM capable model? I've been using codellama-13b through ollama w/ a vim extension i wrote and it's okay but not amazing, I definitely get better code most of the time out of Gemma-27b but no FIM (and for some reason codellama-34b has broken inference for me)

By @sa-code - 9 months

It's great to see a high-profile model using Mamba2!

By @imjonse - 9 months

The MBPP column should bold DeepSeek as it has a better score than Codestral.

By @magnio - 9 months

They announce the model is on HuggingFace but don't link to it. Here it is: https://huggingface.co/mistralai/mamba-codestral-7B-v0.1

By @flakiness - 9 months

So Mamba is supposed to be faster and the article claims that. But they don't have any latency numbers.

Has anyone tried this? And then, is it fast(er)?

By @monkeydust - 9 months

Any recommended product primers to Mamba vs Transformers - pros/cons etc?

By @modeless - 9 months

> Unlike Transformer models, Mamba models offer the advantage of linear time inference and the theoretical ability to model sequences of infinite length

> We have tested Codestral Mamba on in-context retrieval capabilities up to 256k tokens

Why only 256k tokens? Gemini's context window is 1 million or more and it's (probably) not even using Mamba.

By @tatsuya4 - 9 months

Just did a quick test in the https://model.box playground, and it looks like the completion length is noticeably shorter than other models (e.g., gpt-4o). However, the response speed meets expectations..

By @culopatin - 9 months

Does anyone have a video or written article that would get one up to speed with a bit of the history/progression and current products that are out there for one to try locally?

This is coming from someone that understands the general concepts of how LLMs work but only used the general publicly available tools like ChatGPT, Claude, etc.

I want to see if I have any hardware I can stress and run something locally, but don’t know where to start or even what are the available options.

By @Kinrany - 9 months

Is there a good explanation of the Mamba architecture?

By @rjurney - 9 months

But I JUST switched from GPT4o to Claude! :( Kidding, but it isn't clear how to use this thing, as others have pointed out.

By @zamalek - 9 months

Is this the active Codestral model on Le Chat? I got quite some mixed results from it tonight.

By @localfirst - 9 months

any sort of evals on how it compares to closed models like chat gpt 4 or open ones like WizardLLM ?

By @taf2 - 9 months

How does this work in vim?

By @pzo - 9 months

weird they compare to deepseek-coder v1.5 when we already have v2.0. Any advantage to use codestral mamba apart from that it's lighter in weights?

By @sam_goldman_ - 9 months

You can try this model out using OpenAI's API format with this TypeScript SDK: https://github.com/token-js/token.js

You just need a Mistral API key: https://console.mistral.ai/api-keys/

By @croemer - 9 months

The first sentence is wrong. The website says:

> As a tribute to Cleopatra, whose glorious destiny ended in tragic snake circumstances

but according to Wikipedia this is not true:

> When Cleopatra learned that Octavian planned to bring her to his Roman triumphal procession, she killed herself by poisoning, contrary to the popular belief that she was bitten by an asp.

By @lolinder - 9 months

I know it's just a throwaway line, but the bit about Cleopatra at the top feels in poor taste. It's completely inaccurate in that no one has ever attributed her death to a "mamba", and even the asp that some sources claim has been disputed. But even aside from that, it just feels weird that a human being's death has turned into a random reference you can make in a throwaway joke while advertising a product.

They're certainly not the first to use Cleopatra this way nor the most egregious, but there are plenty of other random mamba jokes that could have filled in there and both made more sense and been less crass.

Codestral Mamba

Related

NuExtract: A LLM for Structured Extraction

BM42 – a new baseline for hybrid search

Tokens are a big reason today's generative AI falls short

Gemma 2 on AWS Lambda with Llamafile

MobileLLM: Optimizing Sub-Billion Parameter Language Models for On-Device Use

Related

NuExtract: A LLM for Structured Extraction

BM42 – a new baseline for hybrid search

Tokens are a big reason today's generative AI falls short

Gemma 2 on AWS Lambda with Llamafile

MobileLLM: Optimizing Sub-Billion Parameter Language Models for On-Device Use