September 27th, 2024

AMD Unveils Its First Small Language Model AMD-135M

AMD has launched its first small language model, AMD-135M, trained on 670 billion tokens. It features speculative decoding for improved speed and is open-sourced to foster AI community collaboration.

Read original article

AMD Unveils Its First Small Language Model AMD-135M

AMD has introduced its first small language model (SLM), the AMD-135M, which is part of the Llama family. This model was trained from scratch on AMD Instinct MI250 accelerators using 670 billion tokens over six days. It consists of two variants: AMD-Llama-135M and AMD-Llama-135M-code, the latter being fine-tuned with an additional 20 billion tokens of code data. AMD emphasizes an open approach to AI, making the training code, dataset, and model weights available for developers to reproduce and enhance the model. A key feature of the AMD-135M is its use of speculative decoding, which improves inference speed by allowing multiple tokens to be generated in a single forward pass, thus enhancing memory access efficiency. Testing showed significant performance improvements on AMD hardware, including the Instinct MI250 accelerator and Ryzen AI processors. AMD aims to foster innovation in the AI community by providing an open-source reference implementation of the model.

- AMD has launched its first small language model, AMD-135M, trained on 670 billion tokens.

- The model features speculative decoding to enhance inference speed and efficiency.

- Both the training code and model weights are open-sourced for community use.

- Performance tests indicate significant speed improvements on AMD hardware.

- AMD aims to promote innovation and collaboration within the AI community.

Llama 3.1 Official Launch

Llama introduces Llama 3.1, an open-source AI model available in 8B, 70B, and 405B versions. The 405B model is highlighted for its versatility in supporting various use cases, including multi-lingual agents and analyzing large documents. Users can leverage coding assistants, real-time or batch inference, and fine-tuning capabilities. Llama emphasizes open-source AI and offers subscribers updates via a newsletter.

Llama 3.1: Our most capable models to date

Meta has launched Llama 3.1 405B, an advanced open-source AI model supporting diverse languages and extended context length. It introduces new features like Llama Guard 3 and aims to enhance AI applications with improved models and partnerships.

Llama 3 Secrets Every Engineer Must Know

Llama 3 is an advanced open-source language model trained on 15 trillion multilingual tokens, featuring 405 billion parameters, improved reasoning, and multilingual capabilities, while exploring practical applications and limitations.

Llama 3.2 released: Multimodal, 1B to 90B sizes

Llama 3.2 has been released as an open-source AI model in various sizes for text and image processing, enhancing application development and gaining significant traction with over 350 million downloads.

LlamaF: An Efficient Llama2 Architecture Accelerator on Embedded FPGAs

The paper presents an FPGA-based accelerator for large language models, achieving 14.3-15.8 times speedup and 6.1 times power efficiency, enhancing deployment in resource-constrained environments.

12 comments

By @diggan - 8 months

> The training code, dataset and weights for this model are open sourced so that developers can reproduce the model and help train other SLMs and LLMs.

Wow, an actual open source language model (first of its kind [from a larger company] maybe even?), includes all you need to be able to recreate it from scratch. Thanks AMD!

Available under this funky GitHub organization it seems: https://github.com/AMD-AIG-AIMA/AMD-LLM

By @n_ary - 8 months

Now this here is the beginning on real innovation of AI. With AMD coming in(albeit late and slowly), meta with LLama improving, we will soon see some real adaptation and development in next few thousand days. At this moment, I see OAI as the yahoo of the pre-Google era.

By @highfrequency - 8 months

Looks like they are using sixteen $13k GPUs [1] (around $210k hardware) for 6 days of training.

Anyone know the recommended cloud provider and equivalent rental price?

[1] https://www.wiredzone.com/shop/product/10025451-supermicro-g...

By @benterix - 8 months

I'm happy to see a truly open source model.

Actually, AMD has excellent reasons to make this kind of development and I hope they continue.

By @luyu_wu - 8 months

The section on speculative execution is interesting. "This approach allows each forward pass to generate multiple tokens without compromising performance, thereby significantly reducing memory access consumption, and enabling several orders of magnitude speed improvements."

Does anyone know if the "several orders of magnitude speed improvement" is accurate? I'm doubtful.

Very interesting though! I'll be playing around with this on the weekend!

By @Decabytes - 8 months

Since most people can’t run these LLMs locally, I wonder what a model would look like where we have hyper tuned models for specific purposes, IE a model for code, a model for prose, etc. you have a director model that interprets what downstream model should be used and then it runs that. That way you can run the model locally, without needing beefy GPUs. It’s a trade off of using more disk space vs needing more vram

By @craftkiller - 8 months

I see multiple mentions of NPU on this page, but its still not clear to me: is this something that can finally use the NPU on my processor?

By @loufe - 8 months

It's always encouraging to see wider hardware platform competition for AI inference and training. Access to affordable and capable hardware for consumers will only benefit (I imagine) from increasing competition.

By @bjt12345 - 8 months

> [1] The training code for AMD-135M is based on TinyLlama, utilizing multi-node distributed training with PyTorch FSDP.

I thought PyTorch didn't work well with AMD architecture, and read of many people using JAX instead?

By @rsolva - 8 months

Can this model run on ollama?

Llama 3.1 Official Launch

Llama 3.1: Our most capable models to date

Llama 3 Secrets Every Engineer Must Know

Llama 3.2 released: Multimodal, 1B to 90B sizes

LlamaF: An Efficient Llama2 Architecture Accelerator on Embedded FPGAs

The paper presents an FPGA-based accelerator for large language models, achieving 14.3-15.8 times speedup and 6.1 times power efficiency, enhancing deployment in resource-constrained environments.

AMD Unveils Its First Small Language Model AMD-135M

Related

Llama 3.1 Official Launch

Llama 3.1: Our most capable models to date

Llama 3 Secrets Every Engineer Must Know

Llama 3.2 released: Multimodal, 1B to 90B sizes

LlamaF: An Efficient Llama2 Architecture Accelerator on Embedded FPGAs

Related

Llama 3.1 Official Launch

Llama 3.1: Our most capable models to date

Llama 3 Secrets Every Engineer Must Know

Llama 3.2 released: Multimodal, 1B to 90B sizes

LlamaF: An Efficient Llama2 Architecture Accelerator on Embedded FPGAs