December 12th, 2024

A ChatGPT clone, in 3000 bytes of C, backed by GPT-2 (2023)

Nicholas Carlini created a 3000-byte C program mimicking ChatGPT using the GPT-2 model. It is dependency-free, efficient, and available on GitHub, though output quality is poor.

Read original articleLink Icon
AppreciationSkepticismCuriosity
A ChatGPT clone, in 3000 bytes of C, backed by GPT-2 (2023)

Nicholas Carlini has developed a minimalistic implementation of a ChatGPT-like program using only 3000 bytes of C code, which operates on the GPT-2 model. This program is designed to be dependency-free, loading the necessary weight matrix and byte-pair encoding (BPE) files from TensorFlow. It includes a basic linear algebra library for matrix operations, defines the transformer architecture, and performs inference while handling input and output through a simple BPE decoder. Although the output quality is noted to be poor, the program runs efficiently on modern machines, particularly with the GPT-2 Small model. The implementation features optimizations such as key-value caching and an efficient matrix multiplication algorithm, allowing it to generate responses in a few seconds. The code is available on GitHub for public use. The project highlights the simplicity of neural networks and the potential to encapsulate complex functionalities within a compact codebase.

- A ChatGPT clone has been implemented in 3000 bytes of C, utilizing the GPT-2 model.

- The program is dependency-free and includes essential components like matrix operations and transformer architecture.

- It runs efficiently on modern machines, particularly with the GPT-2 Small model, despite producing low-quality output.

- The implementation features optimizations such as key-value caching and efficient matrix multiplication.

- The code is publicly available on GitHub for experimentation and use.

AI: What people are saying
The comments reflect a mix of opinions on Nicholas Carlini's C program mimicking ChatGPT using the GPT-2 model.
  • Some users appreciate the educational value and compactness of the code, viewing it as a demonstration of neural networks.
  • Critiques focus on the output quality of the program, with comparisons to other AI models and chatbots.
  • There is discussion about the differences between GPT-2 and newer models like GPT-3, particularly regarding improvements in performance.
  • Several comments highlight the simplicity and efficiency of the implementation, with some users expressing interest in the underlying mechanics.
  • Others share links to alternative implementations and express nostalgia for earlier AI models.
Link Icon 28 comments
By @hilti - 4 months
I haven‘t run the code, but I‘m impressed by the small size. The first ELISA programs have been larger. And within the past 4 years we can fit this into bytes.

If someone has a hint where the magic lies in, please explain it to me. Is it the GELU function or the model that‘s downloaded through the bash script?

By @andai - 4 months
I remember playing with GPT-2 when it came out. A friend and I exported our chat logs, fine tuned GPT-2 on them, and had it simulate conversations between us. It was hilarious and sometimes disturbingly accurate.

I'm wondering what caused the quantum leap between GPT-2 and 3? Bigger model, more data, or both? I know RLHF makes a huge difference, but even the base GPT-3 model (text completion) was very useful, given enough examples.

By @stavros - 4 months
I don't know, GPT-2 wrote some of my favorite fairy tales:

https://deepdreams.stavros.io/episodes/the-princess-the-fair...

By @thomasfl - 4 months
"While I mostly put this together for fun, it's a nice demonstration how simple neural networks actually are."

Psst, don't tell anyone. Artificial Intelligence is the black magic we do to make money.

By @hmottestad - 4 months
Is GPT-2 instruction tuned so that it can actually be used for chat? Otherwise I feel it’s more than just a stretch to call this a ChatGPT clone.
By @fulafel - 4 months
> (TAKE THAT LANGUAGES WITH PROPER MACROS. LISP ISN'T ALWAYS BETTER THAN C!)

Ok, allowed this time as punching up.

--

If you missed the code link (it's buried in the text): https://github.com/carlini/c-chat-gpt-2

By @anthk - 4 months
I've seen better with classical AI chatbots:

https://www.cs.cmu.edu/afs/cs/project/ai-repository/ai/areas...

Splotch will compile it fine on modern unixen with few changes.

By @ab_testing - 4 months
Did anybody run it locally to see what kind Of output is generated by this GP2 running locally.
By @zxspectrum1982 - 4 months
These days, you can easily implement your own ChatGPT in a snap by using gptscript: https://github.com/gptscript-ai/gptscript
By @go_prodev - 4 months
GELU really is like magic:

UNARY(GELU, b / 2 * (1 + tanh(.7978845 * (b + .044715 * b * b * b))))

By @atiedebee - 4 months
I dont see how C macros are anything like regex. They match words and replace them with different text. Regex is about matching text with relatively complex patterns, and by itself doesn't do any text replacement.
By @deadbabe - 4 months
So will it ever be possible to do something similar with ChatGPT 3.5 or so?
By @tmaly - 4 months
What type of hardware could this run on?

Could quantized weights on huggingface be used with this?

What type of problems or queries would this be really good at?

By @ljouhet - 4 months
I love the way it's written: see this ChatGPT-like code that fits in your screen? Let's break it down!
By @boiler_up800 - 4 months
Intelligence as a new fundamental layer! These small programs that do so much really inspire me.
By @benob - 4 months
Should try with smollm2 which is one of the smallest instructed models.
By @robblbobbl - 4 months
Thanks. For the learning it's really good.
By @ethan-j - 4 months
At least it's kind of a conversation.

bash ./run.sh

AI: How can I help you? Human: who are you AI: I am alice. Human: what can you do for me? AI: I can help you. Human: how to say bird in Chinese? AI: bird in Chinese. Human: 2+2=? AI: 2+2= bird.

By @neomantra - 4 months
Are commentors even reading the article or just bitching about how GPT-2 sucks or pedantry of LoC metrics?

Look at this article a different way. The author put a lot of information in a small context window so that it easier for readers to understand transformers. They included a useful code highlighter to ground it.

To soooo many people, even those with strong math/science, GPT is magic. This article opens the spell book, lays it out as a computation graph with all the fixing’s. The code isn’t abominable especially when paired with the prose.

It is a good piece of education and I will also call it art.

By @owenpalmer - 4 months
"I am not a madman for saying that it is likely that the code for artificial general intelligence is going to be tens of thousands of lines of code, not millions of lines of code. This is code that conceivably one individual could write, unlike writing a new web browser or operating system."

- John Carmack, Lex Fridman Podcast (August 4th, 2022)

This was around 3 months before ChatGPT's initial release.

Timestamped: https://www.youtube.com/watch?v=I845O57ZSy4&t=14677s

By @paxys - 4 months
What's the point of minifying code that is anyways going to be compiled?
By @tomcam - 4 months
Jart is already hard at work on a version that’s only 2070 bytes but includes a full lisp interpreter for the prompt parser
By @tgma - 4 months
TL;DR: code-golf style C program to do inference on an existing TensorFlow model data for GPT2, not full ChatGPT, nor training or anything.