September 25th, 2024

Llama 3.2: Revolutionizing edge AI and vision with open, customizable models

Meta released Llama 3.2, featuring vision models with 11B and 90B parameters, and lightweight text models with 1B and 3B parameters, optimized for edge devices and supporting extensive deployment options.

Read original articleLink Icon
AmazementGratitudeCuriosity
Llama 3.2: Revolutionizing edge AI and vision with open, customizable models

Meta has announced the release of Llama 3.2, which introduces small and medium-sized vision large language models (LLMs) with 11B and 90B parameters, as well as lightweight text-only models with 1B and 3B parameters. These models are designed for edge and mobile devices, supporting a context length of 128K tokens and optimized for Qualcomm and MediaTek hardware. The vision models are capable of advanced image understanding tasks, such as document-level comprehension and visual grounding, while the lightweight models excel in multilingual text generation and tool calling. Llama 3.2 models are available for download on llama.com and Hugging Face, and can be developed on various partner platforms, including AWS and Google Cloud. The release also includes the Llama Stack, which simplifies the deployment of these models across different environments. Meta emphasizes the importance of openness and collaboration in driving innovation, and the new models are positioned as competitive alternatives to closed models. The Llama 3.2 models have undergone extensive evaluation, demonstrating strong performance in image recognition and visual reasoning tasks. The release aims to empower developers with accessible tools for building applications that prioritize privacy and efficiency.

- Llama 3.2 includes vision models (11B, 90B) and lightweight text models (1B, 3B) for edge devices.

- Models support a context length of 128K tokens and are optimized for Qualcomm and MediaTek hardware.

- Llama Stack simplifies deployment across various environments, enhancing developer accessibility.

- The models demonstrate competitive performance in image understanding and multilingual text generation.

- Meta emphasizes openness and collaboration to foster innovation in AI development.

AI: What people are saying
The release of Meta's Llama 3.2 has generated significant discussion among users, focusing on its capabilities and performance.
  • Users are impressed with the performance of the smaller models (1B and 3B), noting their efficiency and ability to handle complex tasks.
  • There are mixed reviews regarding the larger vision models (11B and 90B), with some users finding them less effective compared to competitors.
  • Many users appreciate Meta's openness in sharing model details and deployment options, fostering a collaborative environment.
  • Concerns about accessibility and performance on various devices, particularly for the larger models, are frequently mentioned.
  • Some users express curiosity about the models' capabilities in specific applications, such as code assistance and multilingual support.
Link Icon 49 comments
By @simonw - about 2 months
I'm absolutely amazed at how capable the new 1B model is, considering it's just a 1.3GB download (for the Ollama GGUF version).

I tried running a full codebase through it (since it can handle 128,000 tokens) and asking it to summarize the code - it did a surprisingly decent job, incomplete but still unbelievable for a model that tiny: https://gist.github.com/simonw/64c5f5b111fe473999144932bef42...

More of my notes here: https://simonwillison.net/2024/Sep/25/llama-32/

I've been trying out the larger image models to using the versions hosted on https://lmarena.ai/ - navigate to "Direct Chat" and you can select them from the dropdown and upload images to run prompts.

By @opdahl - about 2 months
I'm blown away with just how open the Llama team at Meta is. It is nice to see that they are not only giving access to the models, but they at the same time are open about how they built them. I don't know how the future is going to go in the terms of models, but I sure am grateful that Meta has taken this position, and are pushing more openness.
By @a_wild_dandan - about 2 months
"The Llama jumped over the ______!" (Fence? River? Wall? Synagogue?)

With 1-hot encoding, the answer is "wall", with 100% probability. Oh, you gave plausibility to "fence" too? WRONG! ENJOY MORE PENALTY, SCRUB!

I believe this unforgiving dynamic is why model distillation works well. The original teacher model had to learn via the "hot or cold" game on text answers. But when the child instead imitates the teacher's predictions, it learns semantically rich answers. That strikes me as vastly more compute-efficient. So to me, it makes sense why these Llama 3.2 edge models punch so far above their weight(s). But it still blows my mind thinking how far models have advanced from a year or two ago. Kudos to Meta for these releases.

By @alanzhuly - about 2 months
Llama3.2 3B feels a lot better than other models with same size (e.g. Gemma2, Phi3.5-mini models).

For anyone looking for a simple way to test Llama3.2 3B locally with UI, Install nexa-sdk(https://github.com/NexaAI/nexa-sdk) and type in terminal:

nexa run llama3.2 --streamlit

Disclaimer: I am from Nexa AI and nexa-sdk is an open-sourced. We'd love your feedback.

By @freedomben - about 2 months
If anyone else is looking for the bigger models on ollama and wondering where they are, the Ollama blog post answered that for me. The are "coming soon" so they just aren't ready quite yet[1]. I was a little worried when I couldn't find them but sounds like we just need to be patient.

[1]: https://ollama.com/blog/llama3.2

By @moffkalast - about 2 months
I've just tested the 1B and 3B at Q8, some interesting bits:

- The 1B is extremely coherent (feels something like maybe Mistral 7B at 4 bits), and with flash attention and 4 bit KV cache it only uses about 4.2 GB of VRAM for 128k context

- A Pi 5 runs the 1B at 8.4 tok/s, haven't tested the 3B yet but it might need a lower quant to fit it and with 9T training tokens it'll probably degrade pretty badly

- The 3B is a certified Gemma-2-2B killer

Given that llama.cpp doesn't support any multimodality (they removed the old implementation), it might be a while before the 11B and 90B become runnable. Doesn't seem like they outperform Qwen-2-VL at vision benchmarks though.

By @dhbradshaw - about 2 months
Tried out 3B on ollama, asking questions in optics, bio, and rust.

It's super fast with a lot of knowledge, a large context and great understanding. Really impressive model.

By @kingkongjaffa - about 2 months
llama3.2:3b-instruct-q8_0 is performing better than 3.1 8b-q4 on my macbookpro M1. It's faster and the results are better. It answered a few riddles and thought experiments better despite being 3b vs 8b.

I just removed my install of 3.1-8b.

my ollama list is currently:

$ ollama list

NAME ID SIZE MODIFIED

llama3.2:3b-instruct-q8_0 e410b836fe61 3.4 GB 2 hours ago

gemma2:9b-instruct-q4_1 5bfc4cf059e2 6.0 GB 3 days ago

phi3.5:3.8b-mini-instruct-q8_0 8b50e8e1e216 4.1 GB 3 days ago

mxbai-embed-large:latest 468836162de7 669 MB 3 months ago

By @kgeist - about 2 months
Tried the 1B model with the "think step by step" prompt.

It gets "which is larger: 9.11 or 9.9?" right if it manages to mention that decimals need to be compared first in its step-by-step thinking. If it skips mentioning decimals, then it says 9.11 is larger.

It gets the strawberry question wrong even after enumerating all the letters correctly, probably because it can't properly count.

By @JohnHammersley - about 2 months
By @getcrunk - about 2 months
Still no 14/30b parameter models since llama 2. Seriously killing real usability for power users/diy.

The 7/8B models are great for poc and moving to edge for minor use cases … but there’s a big and empty gap till 70b that most people can’t run.

The tin foil hat in me is saying this is the compromise the powers that be have agreed too. Basically being “open” but practically gimped for average joe techie. Basically arms control

By @arnaudsm - about 2 months
Is there an up-to-date leaderboard with multiple LLM benchmarks?

Livebench and Lmsys are weeks behind and sometimes refuse to add some major models. And press releases like this cherry pick their benchmarks and ignore better models like qwen2.5.

If it doesn't exist I'm willing to create it

By @gdiamos - about 2 months
Llama 3.2 includes a 1B parameter model. This should be 8x higher throughput for data pipelines. In our experience, smaller models are just fine for simple tasks like reading paragraphs from PDF documents.
By @kombine - about 2 months
Are these models suitable for Code assistance - as an alternative to Cursor or Copilot?
By @Ey7NFZ3P0nzAe - about 2 months
Interesting that its scores are somewhat helow Pixtral 12B https://mistral.ai/news/pixtral-12b/
By @gunalx - about 2 months
3b was pretty good at multimodal (Norwegian) still a lot of gibberish at times, and way more sensitive than 8b but more usable than Gemma 2 2b at multi modal, fine at my python list sorter with args standard question. But 90b vision just refuses all my actually useful tasks like helping recreate the images in html or do anything useful with the image data other than describing it. Have not gotten as stuck with 70b or openai before. Insane amount of refusals all the time.
By @resters - about 2 months
This is great! Does anyone know if the llama models are trained to do function calling like openAI models are? And/or are there any function calling training datasets?
By @l5870uoo9y - about 2 months
> These models are enabled on day one for Qualcomm and MediaTek hardware and optimized for Arm processors.

Do they require GPU or can they be deployed on VPS with dedicated CPU?

By @chriskanan - about 2 months
The assessments of visual capability really need to be more robust. They are still using datasets like VQAv2, which while providing some insight, have many issues. There are many newer datasets that serve as much more robust tests and that are less prone to being affected by linguistic bias.

I'd like to see more head-to-head comparisons with community created multi-modal LLMs as done in these papers:

https://arxiv.org/abs/2408.05334

https://arxiv.org/abs/2408.03326

I look forward to reading the technical report, once its available. I couldn't find a link to one, yet.

By @sgt - about 2 months
Anyone on HN running models on their own local machines, like smaller Llama models or such? Or something else?
By @404mm - about 2 months
Can anyone recommend a webUI client for ollama?
By @xrd - about 2 months
I'm currently fighting with a fastapi python app deployed to render. It's interesting because I'm struggling to see how I encode the image and send it using curl. Their example sends directly from the browser and uses a data uri.

But, this is relevant because I'm curious how this new model allows image inputs. Do you paste a base64 image into the prompt?

It feels like these models can start not only providing the text generation backend, but start to replace the infrastructure for the API as well.

Can you input images without something in front of it like openwebui?

By @josephernest - about 2 months
Can it run with llama-cpp-python? If so, where can we find and download the gguf files? Are they distributed directly by meta, or are they converted to gguf format by third parties?
By @thimabi - about 2 months
Does anyone know how these models fare in terms of multilingual real-world usage? I’ve used previous iterations of llama models and they all seemed to be lacking in that regard.
By @aussieguy1234 - about 2 months
When using meta.ai, its able to generate images as well as understand them. Has this also been open sourced or just a GPT4o style ability to see images?
By @desireco42 - about 2 months
I have to say that running this model locally I was pleasantly suprised how well it ran, it doesn't use as much resources and produce decent output, comparable to ChatGPT, it is not quite as OpenAI but for a lot of tasks, since it doesn't burden the computer, it can be used with local model.

Next I want to try to use Aider with it and see how this would work.

By @GaggiX - about 2 months
The 90B seem to perform pretty weak on visual tasks compare to Qwen2-VL-72B: https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct, or am I missing something?
By @notpublic - about 2 months
Llama-3.2-11B-Vision-Instruct does an excellent job extracting/answering questions from screenshots. It is even able to answer questions based on information buried inside a flowchart. How is this even possible??
By @sk11001 - about 2 months
Can one of thse models be run on a single machine? What specs do you need?
By @bombi - about 2 months
Is Termux enough to run the 1B model on Android?
By @oulipo - about 2 months
Can the 3B run on a M1 macbook? It seems that it hogs all the memory. The 1B runs fine
By @bottlepalm - about 2 months
What mobile devices can the smaller models run on? iPhone, Android?
By @troupo - about 2 months
It's hallucinating so badly, it's kinda hilarious

Literally everything about the quote below is wrong. The problem with these "amazing ground-breaking models" is that you never know which knowledge they actually possess.

--- start quote ---

>>> what is the last version of Ash Framework for Elixir you know My training data goes up to December 2023, and according to the Ash Framework releases, the last stable version that I'm familiar with is:

* *Ash 2.14*: Released on April 8, 2021.

However, it's worth noting that Ash has been merged into Elixir itself, and most of its features have been integrated into the Elixir language and standard libraries.

In fact, Ash was one of the first frameworks to be included in Elixir, and it played a significant role in shaping the language. Many of Ash's core concepts, such as `defstruct/1` and `defmodule/1`, were later adopted by Elixir itself.

...

Keep in mind that Ash is no longer actively maintained as a separate framework, but its legacy continues to influence Elixir and its community.

--- end quote ---

By @gdiamos - about 2 months
Do inference frameworks like vllm support vision?
By @stogot - about 2 months
Surprised no mention of audio?
By @ofermend - about 2 months
Great release. Models just added to Hallucination Leaderboard: https://github.com/vectara/hallucination-leaderboard.

TL;DR: * 90B-Vision: 4.3% hallucination rate * 11B-Vision: 5.5% hallucination rate

By @dharma1 - about 2 months
are these better than qwen at codegen?
By @taytus - about 2 months
meta.ai still running on 3.1
By @84adam - about 2 months
excited for this
By @sva_ - about 2 months
Curious about the multimodal model's architecture. But alas, when I try to request access

> Llama 3.2 Multimodal is not available in your region.

It sounds like they input the continuous output of an image encoder into a transformer, similar to transfusion[0]? Does someone know where to find more details?

Edit:

> Regarding the licensing terms, Llama 3.2 comes with a very similar license to Llama 3.1, with one key difference in the acceptable use policy: any individual domiciled in, or a company with a principal place of business in, the European Union is not being granted the license rights to use multimodal models included in Llama 3.2. [1]

What a bummer.

0. https://www.arxiv.org/abs/2408.11039

1. https://huggingface.co/blog/llama32#llama-32-license-changes...

By @minimaxir - about 2 months
Off topic/meta, but the Llama 3.2 news topic received many, many HN submissions and upvotes but never made it to the front page: the fact that it's on the front page now indicates that moderators intervened to rescue it: https://news.ycombinator.com/from?site=meta.com (showdead on)

If there's an algorithmic penalty against the news for whatever reason, that may be a flaw in the HN ranking algorithm.

By @nmwnmw - about 2 months
- Llama 3.2 introduces small vision LLMs (11B and 90B parameters) and lightweight text-only models (1B and 3B) for edge/mobile devices, with the smaller models supporting 128K token context.

- The 11B and 90B vision models are competitive with leading closed models like Claude 3 Haiku on image understanding tasks, while being open and customizable.

- Llama 3.2 comes with official Llama Stack distributions to simplify deployment across environments (cloud, on-prem, edge), including support for RAG and safety features.

- The lightweight 1B and 3B models are optimized for on-device use cases like summarization and instruction following.

By @monkfish328 - about 2 months
Zuckerberg has never liked having Android/iOs as gatekeepers i.e. "platforms" for his apps.

He's hoping to control AI as the next platform through which users interact with apps. Free AI is then fine if the surplus value created by not having a gatekeeper to his apps exceeds the cost of the free AI.

That's the strategy. No values here - just strategy folks.

By @TheAceOfHearts - about 2 months
I still can't access the hosted model at meta.ai from Puerto Rico, despite us being U.S. citizens. I don't know what Meta has against us.

Could someone try giving the 90b model this word search problem [0] and tell me how it performs? So far with every model I've tried, none has ever managed to find a single word correctly.

[0] https://imgur.com/i9Ps1v6

By @alexcpn - about 2 months
In KungfuPanda there is this line that the Panda says "I love KungFuuuuuuuu", well I normally don't tell like this, but when I saw this and (starting to use this), I feel like yelling"I like Metaaaaa or is it LLAMMMAAA or is it Open source.. or is it this cool ecosystem which gives such value for free...
By @404mm - about 2 months
Newbie question, what size model would be needed to have a 10x software engineer skills and no knowledge of the human kind (ie, no need to know how to make a pizza or sequence your DNA). Is there such a model?