August 31st, 2024

Building LLMs from the Ground Up: A 3-Hour Coding Workshop

Sebastian Raschka is hosting a 3-hour workshop on building Large Language Models, covering implementation, training, and evaluation, with resources including a GitHub repository and his book on LLMs.

Read original articleLink Icon
CuriosityAppreciationSkepticism
Building LLMs from the Ground Up: A 3-Hour Coding Workshop

Sebastian Raschka has announced a 3-hour coding workshop focused on building Large Language Models (LLMs). The workshop aims to provide participants with a comprehensive understanding of LLMs, covering topics such as implementation, training, and usage. The content is structured into several parts, starting with an introduction to LLMs and progressing through the necessary materials, input data handling, coding an LLM architecture, and pretraining. The workshop also includes sections on loading pretrained weights, instruction finetuning, and evaluating performance. The video features clickable chapter marks for easy navigation. This workshop is a follow-up to a previous successful session and is designed for those interested in gaining hands-on experience with LLMs. Participants are encouraged to utilize accompanying resources, including a GitHub repository with workshop code and references to Raschka's book on building LLMs from scratch.

- The workshop is designed to teach participants how to implement and train LLMs.

- It includes various topics such as input data handling, architecture coding, and performance evaluation.

- The video features clickable chapters for easy navigation through the content.

- This session follows a previous successful workshop and aims to provide hands-on experience.

- Resources include a GitHub repository and a book by Sebastian Raschka on LLMs.

Related

GitHub – Karpathy/LLM101n: LLM101n: Let's Build a Storyteller

GitHub – Karpathy/LLM101n: LLM101n: Let's Build a Storyteller

The GitHub repository "LLM101n: Let's build a Storyteller" offers a course on creating a Storyteller AI Large Language Model using Python, C, and CUDA. It caters to beginners, covering language modeling, deployment, programming, data types, deep learning, and neural nets. Additional chapters and appendices are available for further exploration.

LLMs on the Command Line

LLMs on the Command Line

Simon Willison presented a Python command-line utility for accessing Large Language Models (LLMs) efficiently, supporting OpenAI models and plugins for various providers. The tool enables running prompts, managing conversations, accessing specific models like Claude 3, and logging interactions to a SQLite database. Willison highlighted using LLM for tasks like summarizing discussions and emphasized the importance of embeddings for semantic search, showcasing LLM's support for content similarity queries and extensibility through plugins and OpenAI API compatibility.

Karpathy: Let's reproduce GPT-2 (1.6B): one 8XH100 node 24h $672 in llm.c

Karpathy: Let's reproduce GPT-2 (1.6B): one 8XH100 node 24h $672 in llm.c

The GitHub repository focuses on the "llm.c" project by Andrej Karpathy, aiming to implement Large Language Models in C/CUDA without extensive libraries. It emphasizes pretraining GPT-2 and GPT-3 models.

LLMs can solve hard problems

LLMs can solve hard problems

LLMs, like Claude 3.5 'Sonnet', excel in tasks such as generating podcast transcripts, identifying speakers, and creating episode synopses efficiently. Their successful application demonstrates practicality and versatility in problem-solving.

An Open Course on LLMs, Led by Practitioners

An Open Course on LLMs, Led by Practitioners

A new free course, "Mastering LLMs," offers over 40 hours of content on large language models, featuring workshops by 25 experts, aimed at enhancing AI product development for technical individuals.

AI: What people are saying
The comments on Sebastian Raschka's workshop on building Large Language Models reflect a mix of curiosity, critique, and resource sharing.
  • Some users express excitement about the workshop and the accompanying resources, such as Raschka's book.
  • There are questions about the differences between this workshop and other popular resources, like Andrej Karpathy's video.
  • Critiques arise regarding the use of PyTorch, with some arguing it doesn't equate to building LLMs from scratch.
  • Several commenters share their own resources or experiences related to training models, indicating a collaborative spirit.
  • Concerns are raised about the feasibility of building LLMs today, suggesting a focus on practical applications instead.
Link Icon 16 comments
By @atum47 - 7 months
Excuse my ignorance, is this different from Andrej Karpathy https://www.youtube.com/watch?v=kCc8FmEb1nY

Anyway I will watch it tonight before bed. Thank you for sharing.

By @abusaidm - 7 months
Nice write up Sebastian, looking forward to the book. There are lots of details on the LLM and how it’s composed, would be great if you can expand on how Llama and OpenAI could be cleaning and structuring their training data given it seems this is where the battle is heading in the long run.
By @alecco - 7 months
Using PyTorch is not "LLMs from the ground up".

It's a fine PyTorch tutorial but let's not pretend it's something low level.

By @paradite - 7 months
I wrote a practical guide on how to train nanoGPT from scratch on Azure a while ago. It's pretty hands-on and easy to follow:

https://16x.engineer/2023/12/29/nanoGPT-azure-T4-ubuntu-guid...

By @theanonymousone - 7 months
It may be unreasonable, but I have a default negativity toward anything that uses the word "coding" instead of programming or development.
By @leopoldj - 7 months
This is the exact level of details I was looking for. I'm fairly experienced with deep learning and pytorch and don't want to see them built from scratch. I found Andrej's materials too low level and I tend to get lost in the weeds. This is not a criticism but just a comment for someone in a similar situation as I'm.
By @karmakaze - 7 months
This is great. Just yesterday I was wondering how exactly transformers/attention and LLMs work. I'd worked through how back-propagation works in a deep RNN a long while ago and thought it would be interesting to see the rest.
By @alok-g - 7 months
This is great! Hope it works on a Windows 11 machine too (I often find that when Windows isn't explicitly mentioned, the code isn't tested on it and usually fails to work due to random issues).
By @adultSwim - 7 months
This page is just a container for a youtube video. I suggest updating this HN link to point to the video directly, which contains the same links as the page in its description.
By @1zael - 7 months
Sebastian, you are a god among mortals. Thank you.
By @cpill - 7 months
yeah really valuable stuff. so we know how the ginormous model that we can't train or host works (putting practice there are so many hacks and optimizations that none of them work like this). great.
By @eclectic29 - 7 months
This is excellent. Thanks for sharing. It's always good to go back to the fundamentals. There's another resource that is also quite good: https://jaykmody.com/blog/gpt-from-scratch/
By @bschmidt1 - 7 months
Love stuff like this. Tangentially I'm working on useful language models without taking the LLM approach:

Next-token prediction: https://github.com/bennyschmidt/next-token-prediction

Good for auto-complete, spellcheck, etc.

AI chatbot: https://github.com/bennyschmidt/llimo

Good for domain-specific conversational chat with instant responses that doesn't hallucinate.

By @ein0p - 7 months
I’m not sure why you’d want to build an LLM these days - you won’t be able to train it anyway. It’d make a lot of sense to teach people how to build stuff with LLMs, not LLMs themselves.