July 11th, 2024

Karpathy: Let's reproduce GPT-2 (1.6B): one 8XH100 node 24h $672 in llm.c

The GitHub repository focuses on the "llm.c" project by Andrej Karpathy, aiming to implement Large Language Models in C/CUDA without extensive libraries. It emphasizes pretraining GPT-2 and GPT-3 models.

Read original article

Karpathy: Let's reproduce GPT-2 (1.6B): one 8XH100 node 24h $672 in llm.c

The GitHub repository at the provided URL pertains to the "llm.c" project led by Andrej Karpathy. It is dedicated to implementing Large Language Models (LLMs) using simple, pure C/CUDA without relying on extensive libraries like PyTorch or cPython. The primary focus of the project lies in pretraining, particularly in replicating models such as GPT-2 and GPT-3. Within the repository, you can find guidance on reproducing the GPT-2 model, running the code on a single GPU with fp32, CPU execution instructions, available datasets, testing protocols, tutorials, multi-GPU and multi-node training methods, experiments/sweeps, notable forks in various languages, discussions, and licensing details. For further insights or support regarding this initiative, users are encouraged to delve deeper into the repository or seek assistance in the Discussions section of the GitHub platform.

Run the strongest open-source LLM model: Llama3 70B with just a single 4GB GPU

The article discusses the release of open-source Llama3 70B model, highlighting its performance compared to GPT-4 and Claude3 Opus. It emphasizes training enhancements, data quality, and the competition between open and closed-source models.

GitHub – Karpathy/LLM101n: LLM101n: Let's Build a Storyteller

The GitHub repository "LLM101n: Let's build a Storyteller" offers a course on creating a Storyteller AI Large Language Model using Python, C, and CUDA. It caters to beginners, covering language modeling, deployment, programming, data types, deep learning, and neural nets. Additional chapters and appendices are available for further exploration.

LLMs on the Command Line

Simon Willison presented a Python command-line utility for accessing Large Language Models (LLMs) efficiently, supporting OpenAI models and plugins for various providers. The tool enables running prompts, managing conversations, accessing specific models like Claude 3, and logging interactions to a SQLite database. Willison highlighted using LLM for tasks like summarizing discussions and emphasized the importance of embeddings for semantic search, showcasing LLM's support for content similarity queries and extensibility through plugins and OpenAI API compatibility.

From the Tensor to Stable Diffusion

The GitHub repository offers a comprehensive machine learning guide covering deep learning, vision-language models, neural networks, CNNs, RNNs, and paper implementations like LeNet, AlexNet, ResNet, GRU, LSTM, CBOW, Skip-Gram, Transformer, and BERT. Ideal for exploring machine learning concepts.

MobileLLM: Optimizing Sub-Billion Parameter Language Models for On-Device Use

The GitHub repository contains MobileLLM code optimized for sub-billion parameter language models for on-device applications. It includes design considerations, code guidelines, outcomes on common sense reasoning tasks, acknowledgements, and licensing details. Contact repository individuals for support.

7 comments

By @tomalaci - 10 months

With how much NVidia is developing AI-workload accelerating hardware, I expect this will cost maybe few dozen dollars and train in few hours within next few years.

What I think will be interesting is when commodity hardware can run cheap inference from very capable, specialized models. Pretty sure it will spawn a new golden age of AI-powered desktop applications.

For example, video game space has already been trying to create AI-powered NPCs, world generation and story-telling (e.g. Inworld AI).

By @alecco - 10 months

Also https://x.com/karpathy/status/1811467135279104217#m

By @alecco - 10 months

It will be interesting to see this with today's FlashAttention 3 for H100.

By @rurban - 10 months

Would be free for us because we have those H100`s, but currently it's way too hot now. They will reach 70°C, even watercooled.

By @jamestimmins - 10 months

Anyone have an idea if this is feasible to do on a Macbook with a built-in GPU?

Karpathy: Let's reproduce GPT-2 (1.6B): one 8XH100 node 24h $672 in llm.c

Related

Run the strongest open-source LLM model: Llama3 70B with just a single 4GB GPU

GitHub – Karpathy/LLM101n: LLM101n: Let's Build a Storyteller

LLMs on the Command Line

From the Tensor to Stable Diffusion

MobileLLM: Optimizing Sub-Billion Parameter Language Models for On-Device Use

Related

Run the strongest open-source LLM model: Llama3 70B with just a single 4GB GPU

GitHub – Karpathy/LLM101n: LLM101n: Let's Build a Storyteller

LLMs on the Command Line

From the Tensor to Stable Diffusion

MobileLLM: Optimizing Sub-Billion Parameter Language Models for On-Device Use