February 19th, 2025

I built a large language model "from scratch"

Brett Fitzgerald built a large language model inspired by Sebastian Raschka's book, emphasizing hands-on coding, tokenization, fine-tuning for tasks, and the importance of debugging and physical books for learning.

Read original articleLink Icon
I built a large language model "from scratch"

Brett Fitzgerald shares his experience of building a large language model (LLM) from scratch, inspired by Sebastian Raschka's book on the subject. He emphasizes the importance of hands-on coding, opting to type out all code samples to enhance his understanding. Fitzgerald details the process of tokenization, where text is converted into numerical tokens, and how these tokens are used to train the model by establishing relationships based on their context in the training data. He describes the model's text generation capabilities, where it predicts the next word based on previous tokens. Fitzgerald also discusses the fine-tuning process, which involves training the model on specific tasks, such as spam classification. He reflects on his learning journey, noting that while he gained significant insights, he struggled with retaining some concepts. He recognizes the value of debugging as a learning tool and expresses a preference for physical books over digital formats for better retention. Fitzgerald concludes by contemplating further study in AI and machine learning, while also sharing additional resources and updates from Raschka that clarify some of his earlier misunderstandings.

- Brett Fitzgerald built a large language model using Sebastian Raschka's book as a guide.

- He emphasized hands-on coding and the process of tokenization in LLM development.

- The model learns relationships between tokens to generate text based on context.

- Fine-tuning allows the model to specialize in specific tasks, like spam detection.

- Fitzgerald prefers physical books for learning and acknowledges the importance of debugging in understanding code.

Link Icon 4 comments
By @withinrafael - about 2 months
My wife and I are working through this book, and while I see its value and plan to finish it, I've found it (thus far) a bit underwhelming. It feels like a collection of Jupyter notebooks stitched together with a loosely edited narrative. Concepts are sometimes introduced without explanation, instructions lack context, and the growing errata list on Manning's website makes me question if I'm absorbing the right information.
By @rglover - about 2 months
Worthwhile read. It helps to learn (or confirm) how things are working under the hood. What's interesting is that understanding how it all works makes it clear that all the hype around models "thinking" or being "sentient" is just marketing fluff for "the math works and it's really impressive how that translates to human-like cognition."
By @sakesun - about 2 months
I've found none of the explanations of how LLMs are built have been satisfying, especially considering how impressive the applications of them are.
By @eps - about 2 months
I don't see any description of the resulting model in the post. Or any results for that matter. Reads more like a book plug.

Am I missing something?