December 12th, 2024

A ChatGPT clone, in 3000 bytes of C, backed by GPT-2 (2023)

Nicholas Carlini created a 3000-byte C program mimicking ChatGPT using the GPT-2 model. It is dependency-free, efficient, and available on GitHub, though output quality is poor.

Read original article

AppreciationSkepticismCuriosity

A ChatGPT clone, in 3000 bytes of C, backed by GPT-2 (2023)

Nicholas Carlini has developed a minimalistic implementation of a ChatGPT-like program using only 3000 bytes of C code, which operates on the GPT-2 model. This program is designed to be dependency-free, loading the necessary weight matrix and byte-pair encoding (BPE) files from TensorFlow. It includes a basic linear algebra library for matrix operations, defines the transformer architecture, and performs inference while handling input and output through a simple BPE decoder. Although the output quality is noted to be poor, the program runs efficiently on modern machines, particularly with the GPT-2 Small model. The implementation features optimizations such as key-value caching and an efficient matrix multiplication algorithm, allowing it to generate responses in a few seconds. The code is available on GitHub for public use. The project highlights the simplicity of neural networks and the potential to encapsulate complex functionalities within a compact codebase.

- A ChatGPT clone has been implemented in 3000 bytes of C, utilizing the GPT-2 model.

- The program is dependency-free and includes essential components like matrix operations and transformer architecture.

- It runs efficiently on modern machines, particularly with the GPT-2 Small model, despite producing low-quality output.

- The implementation features optimizations such as key-value caching and efficient matrix multiplication.

- The code is publicly available on GitHub for experimentation and use.

How Good Is ChatGPT at Coding, Really?

A study in IEEE evaluated ChatGPT's coding performance, showing success rates from 0.66% to 89%. ChatGPT excelled in older tasks but struggled with newer challenges, highlighting strengths and vulnerabilities.

OpenAI is releasing GPT-4o Mini, a cheaper, smarter model

OpenAI launches GPT-4o Mini, a cost-effective model surpassing GPT-3.5. It supports text and vision, aiming to handle multimodal inputs. Despite simplicity, it scored 82% on benchmarks, meeting demand for smaller, affordable AI models.

Programming with ChatGPT

Henrik Warne finds ChatGPT enhances his programming productivity with tailored code snippets, emphasizing the need for testing. He prefers it over GitHub CoPilot but is skeptical about LLMs replacing programmers.

OpenAI is shockingly good at unminifying code

The article illustrates how ChatGPT can reverse engineer and unminify JavaScript code in a React application, providing a clear breakdown and a readable TypeScript version for learning purposes.

Show HN: TinyGPT – A Simple, Educational Deep Learning Library

TinyGPT is a Python library for implementing and training GPT models, focusing on educational clarity. It features a modular design, installation via GitHub, and encourages contributions and testing with pytest.

AI: What people are saying

The comments reflect a mix of opinions on Nicholas Carlini's C program mimicking ChatGPT using the GPT-2 model.

Some users appreciate the educational value and compactness of the code, viewing it as a demonstration of neural networks.
Critiques focus on the output quality of the program, with comparisons to other AI models and chatbots.
There is discussion about the differences between GPT-2 and newer models like GPT-3, particularly regarding improvements in performance.
Several comments highlight the simplicity and efficiency of the implementation, with some users expressing interest in the underlying mechanics.
Others share links to alternative implementations and express nostalgia for earlier AI models.

28 comments

By @hilti - 4 months

I haven‘t run the code, but I‘m impressed by the small size. The first ELISA programs have been larger. And within the past 4 years we can fit this into bytes.

If someone has a hint where the magic lies in, please explain it to me. Is it the GELU function or the model that‘s downloaded through the bash script?

By @andai - 4 months

I remember playing with GPT-2 when it came out. A friend and I exported our chat logs, fine tuned GPT-2 on them, and had it simulate conversations between us. It was hilarious and sometimes disturbingly accurate.

I'm wondering what caused the quantum leap between GPT-2 and 3? Bigger model, more data, or both? I know RLHF makes a huge difference, but even the base GPT-3 model (text completion) was very useful, given enough examples.

By @stavros - 4 months

I don't know, GPT-2 wrote some of my favorite fairy tales:

https://deepdreams.stavros.io/episodes/the-princess-the-fair...

By @thomasfl - 4 months

"While I mostly put this together for fun, it's a nice demonstration how simple neural networks actually are."

Psst, don't tell anyone. Artificial Intelligence is the black magic we do to make money.

By @hmottestad - 4 months

Is GPT-2 instruction tuned so that it can actually be used for chat? Otherwise I feel it’s more than just a stretch to call this a ChatGPT clone.

By @fulafel - 4 months

> (TAKE THAT LANGUAGES WITH PROPER MACROS. LISP ISN'T ALWAYS BETTER THAN C!)

Ok, allowed this time as punching up.

If you missed the code link (it's buried in the text): https://github.com/carlini/c-chat-gpt-2

By @anthk - 4 months

I've seen better with classical AI chatbots:

https://www.cs.cmu.edu/afs/cs/project/ai-repository/ai/areas...

Splotch will compile it fine on modern unixen with few changes.

By @ab_testing - 4 months

Did anybody run it locally to see what kind Of output is generated by this GP2 running locally.

By @zxspectrum1982 - 4 months

These days, you can easily implement your own ChatGPT in a snap by using gptscript: https://github.com/gptscript-ai/gptscript

By @go_prodev - 4 months

GELU really is like magic:

UNARY(GELU, b / 2 * (1 + tanh(.7978845 * (b + .044715 * b * b * b))))

By @atiedebee - 4 months

I dont see how C macros are anything like regex. They match words and replace them with different text. Regex is about matching text with relatively complex patterns, and by itself doesn't do any text replacement.

By @deadbabe - 4 months

So will it ever be possible to do something similar with ChatGPT 3.5 or so?

By @tmaly - 4 months

What type of hardware could this run on?

Could quantized weights on huggingface be used with this?

What type of problems or queries would this be really good at?

By @ljouhet - 4 months

I love the way it's written: see this ChatGPT-like code that fits in your screen? Let's break it down!

By @boiler_up800 - 4 months

Intelligence as a new fundamental layer! These small programs that do so much really inspire me.

By @benob - 4 months

Should try with smollm2 which is one of the smallest instructed models.

By @robblbobbl - 4 months

Thanks. For the learning it's really good.

By @ethan-j - 4 months

At least it's kind of a conversation.

bash ./run.sh

AI: How can I help you? Human: who are you AI: I am alice. Human: what can you do for me? AI: I can help you. Human: how to say bird in Chinese? AI: bird in Chinese. Human: 2+2=? AI: 2+2= bird.

By @neomantra - 4 months

Are commentors even reading the article or just bitching about how GPT-2 sucks or pedantry of LoC metrics?

Look at this article a different way. The author put a lot of information in a small context window so that it easier for readers to understand transformers. They included a useful code highlighter to ground it.

To soooo many people, even those with strong math/science, GPT is magic. This article opens the spell book, lays it out as a computation graph with all the fixing’s. The code isn’t abominable especially when paired with the prose.

It is a good piece of education and I will also call it art.

By @owenpalmer - 4 months

"I am not a madman for saying that it is likely that the code for artificial general intelligence is going to be tens of thousands of lines of code, not millions of lines of code. This is code that conceivably one individual could write, unlike writing a new web browser or operating system."

- John Carmack, Lex Fridman Podcast (August 4th, 2022)

This was around 3 months before ChatGPT's initial release.

Timestamped: https://www.youtube.com/watch?v=I845O57ZSy4&t=14677s

By @paxys - 4 months

What's the point of minifying code that is anyways going to be compiled?

By @tomcam - 4 months

Jart is already hard at work on a version that’s only 2070 bytes but includes a full lisp interpreter for the prompt parser

By @tgma - 4 months

TL;DR: code-golf style C program to do inference on an existing TensorFlow model data for GPT2, not full ChatGPT, nor training or anything.

How Good Is ChatGPT at Coding, Really?

OpenAI is releasing GPT-4o Mini, a cheaper, smarter model

Programming with ChatGPT

OpenAI is shockingly good at unminifying code

The article illustrates how ChatGPT can reverse engineer and unminify JavaScript code in a React application, providing a clear breakdown and a readable TypeScript version for learning purposes.

A ChatGPT clone, in 3000 bytes of C, backed by GPT-2 (2023)

Related

How Good Is ChatGPT at Coding, Really?

OpenAI is releasing GPT-4o Mini, a cheaper, smarter model

Programming with ChatGPT

OpenAI is shockingly good at unminifying code

Show HN: TinyGPT – A Simple, Educational Deep Learning Library

Related

How Good Is ChatGPT at Coding, Really?

OpenAI is releasing GPT-4o Mini, a cheaper, smarter model

Programming with ChatGPT

OpenAI is shockingly good at unminifying code

Show HN: TinyGPT – A Simple, Educational Deep Learning Library