July 20th, 2024

Converting Codebases with LLMs

Mantle discusses using Large Language Models (LLMs) to convert codebases, emphasizing benefits like improved maintainability and performance. They highlight strategies for automating code translation and optimizing the process.

Read original article

In the article "Working with AI (Part 2): Code Conversion," Mantle discusses their approach to converting a prototype project into a production project by leveraging Large Language Models (LLMs). They highlight the challenges faced in converting codebases from one language to another and the benefits organizations can reap from such conversions, including improved maintainability, performance boost, access to a larger talent pool, and better suitability for production use cases. Mantle's strategy involved using LLMs with over 1 million token windows to assist in the conversion process, aiming to automate the translation of code while focusing on high-value tasks. By providing context such as existing code patterns, libraries, screenshots, and already generated code, Mantle optimized the code generation process. They emphasized the importance of compiling comprehensive context to facilitate reasoning about code and outlined a systematic approach to generating files, starting from backend to frontend. The article concludes by highlighting the efficiencies gained through LLMs in code conversion and the potential for further improvements as token windows expand and models enhance code understanding and generation capabilities.

17 comments

By @Frieren - 10 months

> There is a recurring need in the software world for teams to convert a codebase from one language to another.

Sounds more like a sales pitch than a reality. I have seen many times developers excited to port code from one language to another, but just because it is an opportunity to learn something new, do something different for a change and even rewrite old code.

What is the value if is done automatically, nobody learns anything and the code is just a transcript of the old one?

By @ktzar - 10 months

I wonder how many subtle errors will make their way to the new codebase (decimal rounding, a library uses where a parameter is ignores and there's no tests for it...) only to be found in production and AI will be blamed.

By @zcbenz - 10 months

A few months ago I ported ~15k lines of python code (10k are tests) to typescript, using GPT4. It cost me ~$70.

The python project is https://github.com/ml-explore/mlx and the converted project is https://github.com/frost-beta/node-mlx

I wrote a long prompt: https://github.com/frost-beta/node-mlx/blob/main/tests/promp...

The first result was almost always bad, but after manually modifying the assistant's answer, following generation usually went much better.

By @bustodisgusto - 10 months

This is a perfect use case for LLMs at the moment. I wrote a script to update and express code base to hono. I got Claude to write a regex that would match the handler to the route and called the Claude 3.5 api with an example conversion and some other relevant context.

With the right prompt, it produced extremely clean and workable code.

~20 controller files and over 100 route handlers were converted in about 20 minutes and 5 dollars.

The engineering cost of migrating code bases is trending to 0

By @DarkContinent - 10 months

It's not clear to me from the article how Mantle was porting the build scripts, infrastructure config files, etc across languages. Typically these files don't cleanly translate from one framework to another. Was this considered as part of 20% of project for human engineering effort?

By @largbae - 10 months

I wonder if LLM language conversions will lead to a consolidation of languages. Suppose that you could prototype in any language and autoconvert that resulting functionality to Rust or another language with the right runtime features, would that be an appealing dev model?

By @JTyQZSnP3cQGa8B - 10 months

> a recurring need in the software world for teams to convert a codebase from one language to another

Really? I've only seen that twice in my career, and it was due to being written in the most obsolete tech ever.

I have the same comment for the "patterns" that GPT-bros seem to be stuck in all the time. What kind of software are they writing that needs 80% of duplicated/useless code, and 20% of business code? They should first read Refactoring by Martin Fowler, and try to avoid those mistakes in the future because it's bad to rely on a AI for what should be their job, i.e. engineering software.

> the database querying layer was quite verbose and greatly exceeded an LLM’s output token limit

No technical details as usual, only high-level stories. And how is it possible nowadays to have that kind of issue where most languages have their own SQL or REST library to do everything in, at most, 500 lines of code (if the code is duplicated)?

Last but not least, the main web site is a very pretty empty page if JS it disabled. They should fix that with an LLM and write a blog post, that would be more interesting.

By @rock_artist - 10 months

Recently I’ve converted some code to make an app from python to Swift. I’ve tried using Gemini and ChatGPT. The time I’ve spent afterwards debugging it in order to fix introduced bugs made it not worth it.

IMHO, the way this could work is only if you have very good test coverage so you can run them. But without it this can easily go off the tracks.

By @pmarreck - 10 months

That's odd. I was discussing this very idea with ChatGPT just last night, in the context of coming up with a way to deterministically go from <example code> to <english language description of example code> and back again, and then thought that English might be a good intermediate language when converting logic to a different programming language...

https://chatgpt.com/share/5d2245e8-135e-44f4-a204-401e625183...

By @impure - 10 months

I used an LLM to convert my XML parser from Dart to Go. It was mostly right but with some giant mistakes. This was when I was extremely new to Go, don’t know if I would do it again. It might be faster to manually write the code because that way I could spend less time reading it.

By @gregors - 10 months

I'm curious about the security implications and corporate policies about uploading your entire codebase to an LLM where others can access it (indirectly or directly).

Other than that, I'm very interested to see how easily opensource libraries could be converted from ecosystem A to B.

By @smusamashah - 10 months

What is the current best LLM for coding? I am using Claude Sonnet 3.5 free and it's so good. I am not making anything serious and LLM is perfect for that.

Which current models are better than sonnet for code (plain old html JS is my use case btw)?

By @mspreij - 10 months

This should help porting all the old Cobol and Perl apps out there, no?

By @newzisforsukas - 10 months

https://github.com/facebookresearch/CodeGen

By @redleggedfrog - 10 months

How maintainable the code for humans?

Converting Codebases with LLMs

Related

Related