July 9th, 2024

MobileLLM: Optimizing Sub-Billion Parameter Language Models for On-Device Use

The GitHub repository contains MobileLLM code optimized for sub-billion parameter language models for on-device applications. It includes design considerations, code guidelines, outcomes on common sense reasoning tasks, acknowledgements, and licensing details. Contact repository individuals for support.

Read original article

MobileLLM: Optimizing Sub-Billion Parameter Language Models for On-Device Use

The GitHub repository at the provided URL pertains to MobileLLM, focusing on training code optimized for sub-billion parameter language models for on-device applications. It covers design considerations, citations, code execution guidelines, outcomes on zero-shot common sense reasoning tasks, acknowledgements, contact details, and licensing specifics. For additional information or support regarding this repository, users are encouraged to contact the listed individuals within the repository.

GitHub – Karpathy/LLM101n: LLM101n: Let's Build a Storyteller

The GitHub repository "LLM101n: Let's build a Storyteller" offers a course on creating a Storyteller AI Large Language Model using Python, C, and CUDA. It caters to beginners, covering language modeling, deployment, programming, data types, deep learning, and neural nets. Additional chapters and appendices are available for further exploration.

LLMs on the Command Line

Simon Willison presented a Python command-line utility for accessing Large Language Models (LLMs) efficiently, supporting OpenAI models and plugins for various providers. The tool enables running prompts, managing conversations, accessing specific models like Claude 3, and logging interactions to a SQLite database. Willison highlighted using LLM for tasks like summarizing discussions and emphasized the importance of embeddings for semantic search, showcasing LLM's support for content similarity queries and extensibility through plugins and OpenAI API compatibility.

Meta Large Language Model Compiler

Large Language Models (LLMs) are utilized in software engineering but underused in code optimization. Meta introduces the Meta Large Language Model Compiler (LLM Compiler) for code optimization tasks. Trained on LLVM-IR and assembly code tokens, it aims to enhance compiler understanding and optimize code effectively.

From the Tensor to Stable Diffusion

The GitHub repository offers a comprehensive machine learning guide covering deep learning, vision-language models, neural networks, CNNs, RNNs, and paper implementations like LeNet, AlexNet, ResNet, GRU, LSTM, CBOW, Skip-Gram, Transformer, and BERT. Ideal for exploring machine learning concepts.

Meta AI develops compact language model for mobile devices

Meta AI introduces MobileLLM, a compact language model challenging the need for large AI models. Optimized with under 1 billion parameters, it outperforms larger models by 2.7% to 4.3% on tasks. MobileLLM's innovations include model depth prioritization, embedding sharing, grouped-query attention, and weight-sharing techniques. The 350 million parameter version matches larger models' accuracy on specific tasks, hinting at compact models' potential for efficiency. While not publicly available, Meta has open-sourced the pre-training code, promoting research towards sustainable AI models for personal devices.

16 comments

By @mmastrac - 10 months

> MobileLLM-125M/350M attains a remarkable 2.7%/4.3% accuracy boost over preceding 125M/350M SoTA models on zero-shot commonsense reasoning tasks

Small models, slightly improved, probably still not good enough for the same use as online models. Nothing wrong with incremental progress, however.

1.5B parameter model does seem to be a pretty decent step up, even beating larger models by a wide margin. I'm not sure why they didn't go larger -- having a more efficient model that fits on hardware the size of the RPi could be a gamechanger (IIRC TinyLlama 7B does run, barely).

By @lawlessone - 10 months

Does it have to stay on mobile devices? Bit of niche but if its not a resource hog it could be handy for giving NPC's in games more interesting dialogue without having use

Even better if it could be tuned in someway to allow dialogue to influence NPC behavior or actions.

By @Havoc - 10 months

What apps can one currently use to run them on say an iPhone? Only aware of the MLC one which has literally 3 old models only

By @PoignardAzur - 10 months

I wonder how much you can push the "deeper and thinner" part. At some point your entire FFN fits into your L2 cache, you're bound to get some performance jumps.

By @yshvrdhn - 10 months

Am I missing something but can't something like distillation help here ?

By @banish-m4 - 10 months

Hey HN. I actually have a current need for on-device wake-word-like STT. Which model(s) have the lowest WER and can run on an RPi 4B? I've been looking at openWakeWord. It's for an DIY inventory system.

By @vhiremath4 - 10 months

It seems like the smaller models get the largest size decrease by embedding share/weight tying between the linear head and token embeddings. Is there any research going into how to further reduce size from there?

By @mark336 - 10 months

How about instead of Gen AI on the desktop, just AI on the desktop. Could organize all my files, emails, and notes and let me search for information from my own data.

By @sourcecodeplz - 10 months

Nice, could one use this to train models for Windows PCs also? I don't have a lot of ram.

By @zurfer - 10 months

While this is interesting, I wonder what the use case is, other than better autocomplete?

By @BaculumMeumEst - 10 months

Do Apple Watches have the hardware capability to run inference on a small model? Do I need a developer account to develop on one?

By @KTibow - 10 months

When Gemma 2 2b releases it would be interesting to compare its scaling with this

By @pmontra - 10 months

Interesting research, but Meta do not have any device worth talking about (at least at scale,) unless they want to ship that as part of their apps.

By @cjtrowbridge - 10 months

Why no mmlu or gsm8k?

By @ejdhshsuwisjsh - 10 months

Anyone is aware of custom mobile llms?

Optimizing and loading in your own voice, selecting your primary language and adding a little bit of personal knowledge like nicknames, location and stuff?

My pixel 8 apparently can use / load local models but don't have the time right now to follow that rabbit hole

MobileLLM: Optimizing Sub-Billion Parameter Language Models for On-Device Use

Related