Building LLMs from the Ground Up: A 3-Hour Coding Workshop
Sebastian Raschka is hosting a 3-hour workshop on building Large Language Models, covering implementation, training, and evaluation, with resources including a GitHub repository and his book on LLMs.
Read original articleSebastian Raschka has announced a 3-hour coding workshop focused on building Large Language Models (LLMs). The workshop aims to provide participants with a comprehensive understanding of LLMs, covering topics such as implementation, training, and usage. The content is structured into several parts, starting with an introduction to LLMs and progressing through the necessary materials, input data handling, coding an LLM architecture, and pretraining. The workshop also includes sections on loading pretrained weights, instruction finetuning, and evaluating performance. The video features clickable chapter marks for easy navigation. This workshop is a follow-up to a previous successful session and is designed for those interested in gaining hands-on experience with LLMs. Participants are encouraged to utilize accompanying resources, including a GitHub repository with workshop code and references to Raschka's book on building LLMs from scratch.
- The workshop is designed to teach participants how to implement and train LLMs.
- It includes various topics such as input data handling, architecture coding, and performance evaluation.
- The video features clickable chapters for easy navigation through the content.
- This session follows a previous successful workshop and aims to provide hands-on experience.
- Resources include a GitHub repository and a book by Sebastian Raschka on LLMs.
Related
GitHub – Karpathy/LLM101n: LLM101n: Let's Build a Storyteller
The GitHub repository "LLM101n: Let's build a Storyteller" offers a course on creating a Storyteller AI Large Language Model using Python, C, and CUDA. It caters to beginners, covering language modeling, deployment, programming, data types, deep learning, and neural nets. Additional chapters and appendices are available for further exploration.
LLMs on the Command Line
Simon Willison presented a Python command-line utility for accessing Large Language Models (LLMs) efficiently, supporting OpenAI models and plugins for various providers. The tool enables running prompts, managing conversations, accessing specific models like Claude 3, and logging interactions to a SQLite database. Willison highlighted using LLM for tasks like summarizing discussions and emphasized the importance of embeddings for semantic search, showcasing LLM's support for content similarity queries and extensibility through plugins and OpenAI API compatibility.
Karpathy: Let's reproduce GPT-2 (1.6B): one 8XH100 node 24h $672 in llm.c
The GitHub repository focuses on the "llm.c" project by Andrej Karpathy, aiming to implement Large Language Models in C/CUDA without extensive libraries. It emphasizes pretraining GPT-2 and GPT-3 models.
LLMs can solve hard problems
LLMs, like Claude 3.5 'Sonnet', excel in tasks such as generating podcast transcripts, identifying speakers, and creating episode synopses efficiently. Their successful application demonstrates practicality and versatility in problem-solving.
An Open Course on LLMs, Led by Practitioners
A new free course, "Mastering LLMs," offers over 40 hours of content on large language models, featuring workshops by 25 experts, aimed at enhancing AI product development for technical individuals.
- Some users express excitement about the workshop and the accompanying resources, such as Raschka's book.
- There are questions about the differences between this workshop and other popular resources, like Andrej Karpathy's video.
- Critiques arise regarding the use of PyTorch, with some arguing it doesn't equate to building LLMs from scratch.
- Several commenters share their own resources or experiences related to training models, indicating a collaborative spirit.
- Concerns are raised about the feasibility of building LLMs today, suggesting a focus on practical applications instead.
Anyway I will watch it tonight before bed. Thank you for sharing.
It's a fine PyTorch tutorial but let's not pretend it's something low level.
https://16x.engineer/2023/12/29/nanoGPT-azure-T4-ubuntu-guid...
Next-token prediction: https://github.com/bennyschmidt/next-token-prediction
Good for auto-complete, spellcheck, etc.
AI chatbot: https://github.com/bennyschmidt/llimo
Good for domain-specific conversational chat with instant responses that doesn't hallucinate.
Related
GitHub – Karpathy/LLM101n: LLM101n: Let's Build a Storyteller
The GitHub repository "LLM101n: Let's build a Storyteller" offers a course on creating a Storyteller AI Large Language Model using Python, C, and CUDA. It caters to beginners, covering language modeling, deployment, programming, data types, deep learning, and neural nets. Additional chapters and appendices are available for further exploration.
LLMs on the Command Line
Simon Willison presented a Python command-line utility for accessing Large Language Models (LLMs) efficiently, supporting OpenAI models and plugins for various providers. The tool enables running prompts, managing conversations, accessing specific models like Claude 3, and logging interactions to a SQLite database. Willison highlighted using LLM for tasks like summarizing discussions and emphasized the importance of embeddings for semantic search, showcasing LLM's support for content similarity queries and extensibility through plugins and OpenAI API compatibility.
Karpathy: Let's reproduce GPT-2 (1.6B): one 8XH100 node 24h $672 in llm.c
The GitHub repository focuses on the "llm.c" project by Andrej Karpathy, aiming to implement Large Language Models in C/CUDA without extensive libraries. It emphasizes pretraining GPT-2 and GPT-3 models.
LLMs can solve hard problems
LLMs, like Claude 3.5 'Sonnet', excel in tasks such as generating podcast transcripts, identifying speakers, and creating episode synopses efficiently. Their successful application demonstrates practicality and versatility in problem-solving.
An Open Course on LLMs, Led by Practitioners
A new free course, "Mastering LLMs," offers over 40 hours of content on large language models, featuring workshops by 25 experts, aimed at enhancing AI product development for technical individuals.