July 6th, 2024

Here’s how you can build and train GPT-2 from scratch using PyTorch

A guide on building a GPT-2 language model from scratch using PyTorch. Emphasizes simplicity, suitable for various expertise levels. Involves training on Taylor Swift and Ed Sheeran songs dataset. Includes code snippets and references.

Read original article

Here’s how you can build and train GPT-2 from scratch using PyTorch

This article provides a detailed guide on building and training a GPT-2 language model from scratch using PyTorch. It explains the process step by step, starting from building a custom tokenizer to training a simple language model. The author emphasizes simplicity, making it accessible for individuals with varying levels of Python or machine learning expertise. The project involves creating a GPT-2 model and training it on a dataset containing Taylor Swift and Ed Sheeran songs. The article includes code snippets, explanations, and references to external resources for further understanding. It also outlines the architecture of the model, the data loading process, and the training loop. The goal is to empower readers to construct their own language model and delve into the world of natural language processing. The article hints at a continuation in Part 2 for further exploration.

The Smart Principles: Designing Interfaces That LLMs Understand

Designing user interfaces for Large Language Models (LLMs) is crucial for application success. SMART principles like Simple Inputs, Meaningful Strings, and Transparent Descriptions enhance clarity and reliability. Implementing these principles improves user experience and functionality.

My finetuned models beat OpenAI's GPT-4

Alex Strick van Linschoten discusses his finetuned models Mistral, Llama3, and Solar LLMs outperforming OpenAI's GPT-4 in accuracy. He emphasizes challenges in evaluation, model complexities, and tailored prompts' importance.

Our guidance on using LLMs (for technical writing)

The Ritza Handbook advises on using GPT and GenAI models for writing, highlighting benefits like code samples and overcoming writer's block. However, caution is urged against using GPT-generated text in published articles.

Math Behind Transformers and LLMs

This post introduces transformers and large language models, focusing on OpenGPT-X and transformer architecture. It explains language models, training processes, computational demands, GPU usage, and the superiority of transformers in NLP.

Prompt Injections in the Wild. Exploiting LLM Agents – Hitcon 2023 [video]

The video explores vulnerabilities in machine learning models, particularly GPT, emphasizing the importance of understanding and addressing adversarial attacks. Effective prompt engineering is crucial for engaging with AI models to prevent security risks.

8 comments

By @rty32 - 10 months

Andrej Karpathy's video is probably much better than this:

https://youtu.be/l8pRSuU81PU

By @cjtrowbridge - 10 months

Also check out Andrej's new llm.c library which includes a script to do this from scratch with fineweb.

By @omerhac - 10 months

Cool blog, thanks!

I did a similar project a couple of years ago for a university course, only I also added style transfer, it turned out pretty cool. I scraped a bunch of news data together with it's news section and trained a self attention language model from scratch, turned out pretty hilarious. Data was in Hebrew, which is a challenge to tokenize because of the morphology. I posted it on ArXiV if someone's interested in the style transfer and tokenization process: https://arxiv.org/abs/2212.03019

By @moffkalast - 10 months

That's cool as a learning experience, but if you're gonna build a language transformer, why not instead of ClosedAI's outdated nonsense learn something with a more established open architecture like llama, so whatever you end up training ends up plug and play compatible with every LLM tool in the universe when converted to a GGUF?

Otherwise it's like learning to build a website and stopping short of actually doing the final bit where you put it on a webserver and run it live.

By @aziis98 - 10 months

The link to the repo looks broken

https://github.com/ajeetkharel/gpt2-from-scratch/

By @KTibow - 10 months

Reminds me of TinyStories. I wonder if this architecture is better or worse than the ones it tested.

Here’s how you can build and train GPT-2 from scratch using PyTorch

Related

The Smart Principles: Designing Interfaces That LLMs Understand

My finetuned models beat OpenAI's GPT-4

Our guidance on using LLMs (for technical writing)

Math Behind Transformers and LLMs

Prompt Injections in the Wild. Exploiting LLM Agents – Hitcon 2023 [video]

Related

The Smart Principles: Designing Interfaces That LLMs Understand

My finetuned models beat OpenAI's GPT-4

Our guidance on using LLMs (for technical writing)

Math Behind Transformers and LLMs

Prompt Injections in the Wild. Exploiting LLM Agents – Hitcon 2023 [video]