OpenAI is shockingly good at unminifying code
The article illustrates how ChatGPT can reverse engineer and unminify JavaScript code in a React application, providing a clear breakdown and a readable TypeScript version for learning purposes.
Read original articleThe article discusses the use of ChatGPT to reverse engineer and unminify JavaScript code, specifically within a React application. The author, Frank Fiegel, encountered a minified code block while exploring a component that displayed dynamic ASCII art. Instead of manually deciphering the code or looking for a source map, he decided to leverage ChatGPT to explain the code's functionality. The AI provided a breakdown of the code, detailing its components, such as character set selection, dynamic character generation, and the React component responsible for rendering the ASCII art. Following this, the author requested a TypeScript version of the code, which ChatGPT delivered in a more human-readable format. Although the AI's response missed some implementation details, it was deemed sufficiently clear and useful for learning purposes. The article highlights the potential of using AI tools like ChatGPT for code comprehension and transformation, showcasing a practical application in software development.
- ChatGPT can effectively unminify and explain complex JavaScript code.
- The AI-generated TypeScript version of the code is readable and useful for learning.
- The original code generates dynamic ASCII art based on window size and time.
- Using AI tools can streamline the process of understanding and rewriting code.
- The author found the AI's output valuable despite minor omissions in implementation details.
Related
The Death of the Junior Developer – Steve Yegge
The blog discusses AI models like ChatGPT impacting junior developers in law, writing, editing, and programming. Senior professionals benefit from AI assistants like GPT-4o, Gemini, and Claude 3 Opus, enhancing efficiency and productivity in Chat Oriented Programming (CHOP).
How Good Is ChatGPT at Coding, Really?
A study in IEEE evaluated ChatGPT's coding performance, showing success rates from 0.66% to 89%. ChatGPT excelled in older tasks but struggled with newer challenges, highlighting strengths and vulnerabilities.
Can ChatGPT do data science?
A study led by Bhavya Chopra at Microsoft, with contributions from Ananya Singha and Sumit Gulwani, explored ChatGPT's challenges in data science tasks. Strategies included prompting techniques and leveraging domain expertise for better interactions.
Up to 90% of my code is now generated by AI
A senior full-stack developer discusses the transformative impact of generative AI on programming, emphasizing the importance of creativity, continuous learning, and responsible integration of AI tools in coding practices.
Programming with ChatGPT
Henrik Warne finds ChatGPT enhances his programming productivity with tailored code snippets, emphasizing the need for testing. He prefers it over GitHub CoPilot but is skeptical about LLMs replacing programmers.
- Many users share their positive experiences using LLMs for code unminification and refactoring, highlighting the efficiency and clarity they provide.
- Concerns are raised about the limitations of LLMs, particularly in handling obfuscated code and ensuring semantic fidelity between minified and unminified versions.
- Some commenters discuss the implications of LLMs on code security and obfuscation, questioning the effectiveness of minification as a protective measure.
- There is a debate over the necessity of LLMs for tasks that can be accomplished with existing tools, such as beautifiers for JavaScript.
- Several users express curiosity about the future capabilities of LLMs, including potential applications in reverse engineering and decompilation.
A more general unminification or unobfuscation still seems to be an open problem. I wrote handful of programs that are intentionally obfuscated in the past and ChatGPT couldn't understand them even at the surface level in my experience. For example, a gist for my 160-byte-long Brainfuck interpreter in C had some comment trying to use GPT-4 to explain the code [1], but the "clarified version" bore zero similarity with the original code...
[1] https://gist.github.com/lifthrasiir/596667#gistcomment-47512...
Reminds me of the tool that was provided in older versions of ColdFusion that would "encrypt" your code. It was a very weak algorithm, and didn't take long for someone to write a decrypter. Nevertheless some people didn't like this, because they were using this tool, thinking it was safe for selling their code without giving access to source. (In the late 90s/early 2000s before open source was the overwhelming default)
There’s no denying it. This task is intellectual. Does not involve rote memorization. There are not tons and tons of data pairs on the web of minimized code and unminified code for llms to learn from.
The llm understands what it is unminifying and it is in general superior to humans on this regard. But only in this specific subject.
After reading through this article, I tried again [0]. It gave me something to understand, though it's obfuscated enough to essentially eval unreadable strings (via the Window object), so it's not enough on it's own.
Here was an excerpt of the report I sent to the person:
> For what it’s worth, I dug through the heavily obfuscated JavaScript code and was able to decipher logic that it:
> - Listens for a page load
> - Invokes a facade of calculations which are in theory constant
> - Redirects the page to a malicious site (unk or something)
[0] https://chatgpt.com/share/f51fbd50-8df0-49e9-86ef-fc972bca6b...
Training data would be easy to make in this case. Build tons of free GitHub code with various compilers and train on inverting compilation. This is a case where synthetic training data is appropriate and quite easy to generate.
You could train the decompiler to just invert compilation and the use existing larger code LLMs to do things like add comments.
It has been incredibly liberating to just feed it a spaghetti mess, ask to detangle it in a more readable way and go from there.
As the author also discovered, LLMs will sometimes miss some details, but that is alright as I will be catching those myself.
Another use case is when I understand what the code does, but can't quite wrap my head around why it is done in that specific way. Specifically, where the author of the code is no longer with the company. I will then simply put the method in the LLM chat, explain what it does, and just ask it why some things might be done in a specific way.
Again, it isn't always perfect, but more often than not it comes with explanations that actually make sense, hold up under scrutiny and give me new insights. It actually has prevented me once or twice from refactoring something in a way that would have caught me headaches down the line.
[0] chatGPT and more recently openwebUI as a front end to various other models (Claude variants mostly) to see the differences. Also allows for some fun concepts of having different models review each others answers.
[ed.: looks like this was an encoding problem, cf. thread below. I'm still a little concerned about correctness though.]
All this makes me think AI's are going to be a strong deflationary force in the future.
Actually this opens up a bigger question. What if I like an open source project but don't like its license. I can just prompt AI by giving it the open source code and ask it to rewrite it or write in some other language. Have to look up the rules if this is allowed or will be considered copying and how will a judge prove?
I am testing large language models against a ground truth data set we created internally. Quite often when there is a mismatch, I realize the ground truth dataset is wrong, and I feel exactly like the author did.
edit: chatgpt found out that its rot13 and couldn't explain the code directly without deobfuscating it first.
> The provided code is quite complex, but I'll break it down into a more understandable format, explaining its different parts and their functionalities.
Reading the above statement generated by ChatGPT, I asked myself: Will we live to the day where these LLMs could take a large binary executable as input, read it, analyze it, understand it, then reply with the above statement?
> I followed up asking to "implement equivalent code in TypeScript and make it human readable" and got the following response.. To my surprise, the response is not only good enough, but it is also very readable.
What if this day came and we can ask these LLMs to rewrite the binary code in [almost] any programming language we want? This would be exciting, yet scary to just think about!
I have been doing the Leetcode thing recently, and even became a subscriber to Leetcode.
What I have been doing is I go through the Grind 75 list (Blind 75 successor list), look for the best big O time and space editorial answer, which often has a Java example, and then go to ChatGPT (I subscribe) or Perplexity (don't subscribe to Pro - yet) and say "convert this to Kotlin", which is the language I know best. Jetbrains IDE or Android Studio is capable of doing this, but Perplexity and ChatGPT are usually capable of doing this as well.
Then I say "make this code more compact". Usually I give it some constraints too - keep the big O space and time complexity the same or lower it, keep the function signature of the assigned function the same, and keep the return explicit, make sure no Kotlin non-null assertions crop up. Sometimes I continually have it run these instructions on each version of the iterated code.
I usually test that the code compiles and returns the correct answers for examples after each iteration of compacting. I also copy answers from one to the other - Perplexity to ChatGPT and then back to Perplexity. The code does not always compile, or give the right answers for the examples. Sometimes I overcompact it - what is clear in four lines becomes too confusing in three compacted lines. I'm not looking for the most compact answer, but a clear answer that is as compact as possible.
One question asked about Strings and then later said, what if this was Unicode? So now for String manipulation questions I say assume the String is Unicode, and then at the end say show the answer for ASCII or Unicode. Sometimes the big O time is tricky - it is time O(m+n) say, but since m is always equal to or less than m in the program, it is actually O(n), and both Perplexity and ChatGPT can miss that until it is explained.
People bemoan Leetcode as a waste of time, but I am wasting even less time with it, as ChatGPT and Perplexity are helping give me the code I will be demonstrating in interviews. The common advice I have heard from everywhere is don't waste time trying to figure out the answers myself - just look at the given answers, learn them, and then look for patterns (like binary search problems, which are usually similar), so that is what I am doing.
Initially I was a ChatGPT and Perplexity skeptic for early versions of those sites, in terms of programming, as they stumbled more, but these self-contained examples and procedures they seem well-suited for. Not that they don't hallucinate or give programs that don't compile, or give the wrong answers sometimes, but it saves me time ultimately.
Train on java compiled to class files. Then go from class back to java.
Or even:
Train java compiled to class files, and have separate models that train from Clojure to class and Scala to class files. Then see if you can find some crufty (but important) old java project and go: crufty java -> class -> Clojure (or Scala).
If you could do the same with source -> machine instructions, maybe COBAL to C++! or whatever.
We should go back to uncompiled JavaScript code, our democracy depends on it.
I noticed while reading the blog entry that the author described using a search engine multiple times and thought, "I would have asked ChatGPT first for that."
They mostly fail. A human reverse engineer will still do better.
```csv table_name,column_name,data_type table_name,column_name1,data_type table_name,column_name2,data_type ... ```
I have been running it in production for months[1] as a way to import and optimize database schemas for AI consumption. This performs much better than including the `schema.sql` file in the prompt.
[1]: https://www.sqlai.ai/app/datasources/add/database-schema/ai-...
Tried hard, couldn't find any similar code.
Huh? Is this a thing? There are endless online code formatting sites. It takes two seconds. Why would anyone ever do this? I don't get it.
As someone who is "not a developer" - I use the following process to help my:
1. I setup StyleGuide rules for the AI, telling it how to write out my files/scripts:
- Always provide full path, description of function, invocation examples, and version number.
- Frequently have it summarize and explain the project, project logic, and a particular file's functions.
- Have it create a README.MD for the file/project
- Tell it to give me mermaid diagrams and swim diagrams for the logic/code/project/process
- Close prompts with "Review, Explain, Propose, Confirm, Execute" <-- This has it review the code/problem/prompt, explain what it understands, propose what its been asked to provide, confirm that its correct or I add mroe detail here - then execute and go with creating the artifacts.
I do this because Claude and ChatGPT are FN malevelant in their ignoring of project files/context - and their hallucinate as soon as their context window/memory fills up.
Further they very frequently "forget" to refer to the project context files uploaded/artifacts they themselves have proposed and written etc.
But - asking for a readme with code mermaid and logic is helpful to keep me on track.
However, I have seen a lot of sellers install W11 on non-compatible devices using a few tricks. I'm not sure how you check that in a search tool, but great job otherwise! I'll definitely be using this in the future (and I think you should pass everything through affiliate links! Pay for the upkeep at least)
Related
The Death of the Junior Developer – Steve Yegge
The blog discusses AI models like ChatGPT impacting junior developers in law, writing, editing, and programming. Senior professionals benefit from AI assistants like GPT-4o, Gemini, and Claude 3 Opus, enhancing efficiency and productivity in Chat Oriented Programming (CHOP).
How Good Is ChatGPT at Coding, Really?
A study in IEEE evaluated ChatGPT's coding performance, showing success rates from 0.66% to 89%. ChatGPT excelled in older tasks but struggled with newer challenges, highlighting strengths and vulnerabilities.
Can ChatGPT do data science?
A study led by Bhavya Chopra at Microsoft, with contributions from Ananya Singha and Sumit Gulwani, explored ChatGPT's challenges in data science tasks. Strategies included prompting techniques and leveraging domain expertise for better interactions.
Up to 90% of my code is now generated by AI
A senior full-stack developer discusses the transformative impact of generative AI on programming, emphasizing the importance of creativity, continuous learning, and responsible integration of AI tools in coding practices.
Programming with ChatGPT
Henrik Warne finds ChatGPT enhances his programming productivity with tailored code snippets, emphasizing the need for testing. He prefers it over GitHub CoPilot but is skeptical about LLMs replacing programmers.