A ChatGPT clone, in 3000 bytes of C, backed by GPT-2 (2023)
Nicholas Carlini created a 3000-byte C program mimicking ChatGPT using the GPT-2 model. It is dependency-free, efficient, and available on GitHub, though output quality is poor.
Read original articleNicholas Carlini has developed a minimalistic implementation of a ChatGPT-like program using only 3000 bytes of C code, which operates on the GPT-2 model. This program is designed to be dependency-free, loading the necessary weight matrix and byte-pair encoding (BPE) files from TensorFlow. It includes a basic linear algebra library for matrix operations, defines the transformer architecture, and performs inference while handling input and output through a simple BPE decoder. Although the output quality is noted to be poor, the program runs efficiently on modern machines, particularly with the GPT-2 Small model. The implementation features optimizations such as key-value caching and an efficient matrix multiplication algorithm, allowing it to generate responses in a few seconds. The code is available on GitHub for public use. The project highlights the simplicity of neural networks and the potential to encapsulate complex functionalities within a compact codebase.
- A ChatGPT clone has been implemented in 3000 bytes of C, utilizing the GPT-2 model.
- The program is dependency-free and includes essential components like matrix operations and transformer architecture.
- It runs efficiently on modern machines, particularly with the GPT-2 Small model, despite producing low-quality output.
- The implementation features optimizations such as key-value caching and efficient matrix multiplication.
- The code is publicly available on GitHub for experimentation and use.
Related
How Good Is ChatGPT at Coding, Really?
A study in IEEE evaluated ChatGPT's coding performance, showing success rates from 0.66% to 89%. ChatGPT excelled in older tasks but struggled with newer challenges, highlighting strengths and vulnerabilities.
OpenAI is releasing GPT-4o Mini, a cheaper, smarter model
OpenAI launches GPT-4o Mini, a cost-effective model surpassing GPT-3.5. It supports text and vision, aiming to handle multimodal inputs. Despite simplicity, it scored 82% on benchmarks, meeting demand for smaller, affordable AI models.
Programming with ChatGPT
Henrik Warne finds ChatGPT enhances his programming productivity with tailored code snippets, emphasizing the need for testing. He prefers it over GitHub CoPilot but is skeptical about LLMs replacing programmers.
OpenAI is shockingly good at unminifying code
The article illustrates how ChatGPT can reverse engineer and unminify JavaScript code in a React application, providing a clear breakdown and a readable TypeScript version for learning purposes.
Show HN: TinyGPT – A Simple, Educational Deep Learning Library
TinyGPT is a Python library for implementing and training GPT models, focusing on educational clarity. It features a modular design, installation via GitHub, and encourages contributions and testing with pytest.
- Some users appreciate the educational value and compactness of the code, viewing it as a demonstration of neural networks.
- Critiques focus on the output quality of the program, with comparisons to other AI models and chatbots.
- There is discussion about the differences between GPT-2 and newer models like GPT-3, particularly regarding improvements in performance.
- Several comments highlight the simplicity and efficiency of the implementation, with some users expressing interest in the underlying mechanics.
- Others share links to alternative implementations and express nostalgia for earlier AI models.
If someone has a hint where the magic lies in, please explain it to me. Is it the GELU function or the model that‘s downloaded through the bash script?
I'm wondering what caused the quantum leap between GPT-2 and 3? Bigger model, more data, or both? I know RLHF makes a huge difference, but even the base GPT-3 model (text completion) was very useful, given enough examples.
https://deepdreams.stavros.io/episodes/the-princess-the-fair...
Psst, don't tell anyone. Artificial Intelligence is the black magic we do to make money.
Ok, allowed this time as punching up.
--
If you missed the code link (it's buried in the text): https://github.com/carlini/c-chat-gpt-2
https://www.cs.cmu.edu/afs/cs/project/ai-repository/ai/areas...
Splotch will compile it fine on modern unixen with few changes.
UNARY(GELU, b / 2 * (1 + tanh(.7978845 * (b + .044715 * b * b * b))))
Could quantized weights on huggingface be used with this?
What type of problems or queries would this be really good at?
bash ./run.sh
AI: How can I help you? Human: who are you AI: I am alice. Human: what can you do for me? AI: I can help you. Human: how to say bird in Chinese? AI: bird in Chinese. Human: 2+2=? AI: 2+2= bird.
Look at this article a different way. The author put a lot of information in a small context window so that it easier for readers to understand transformers. They included a useful code highlighter to ground it.
To soooo many people, even those with strong math/science, GPT is magic. This article opens the spell book, lays it out as a computation graph with all the fixing’s. The code isn’t abominable especially when paired with the prose.
It is a good piece of education and I will also call it art.
- John Carmack, Lex Fridman Podcast (August 4th, 2022)
This was around 3 months before ChatGPT's initial release.
Timestamped: https://www.youtube.com/watch?v=I845O57ZSy4&t=14677s
Related
How Good Is ChatGPT at Coding, Really?
A study in IEEE evaluated ChatGPT's coding performance, showing success rates from 0.66% to 89%. ChatGPT excelled in older tasks but struggled with newer challenges, highlighting strengths and vulnerabilities.
OpenAI is releasing GPT-4o Mini, a cheaper, smarter model
OpenAI launches GPT-4o Mini, a cost-effective model surpassing GPT-3.5. It supports text and vision, aiming to handle multimodal inputs. Despite simplicity, it scored 82% on benchmarks, meeting demand for smaller, affordable AI models.
Programming with ChatGPT
Henrik Warne finds ChatGPT enhances his programming productivity with tailored code snippets, emphasizing the need for testing. He prefers it over GitHub CoPilot but is skeptical about LLMs replacing programmers.
OpenAI is shockingly good at unminifying code
The article illustrates how ChatGPT can reverse engineer and unminify JavaScript code in a React application, providing a clear breakdown and a readable TypeScript version for learning purposes.
Show HN: TinyGPT – A Simple, Educational Deep Learning Library
TinyGPT is a Python library for implementing and training GPT models, focusing on educational clarity. It features a modular design, installation via GitHub, and encourages contributions and testing with pytest.