September 28th, 2024

Ask HN: How to deal with AI generated sloppy code

The author raises concerns about AI-generated code being overly complex and bloated, complicating debugging and maintenance, and invites the tech community to share their strategies for managing these issues.

ConcernFrustrationSkepticism

Ask HN: How to deal with AI generated sloppy code

The author, a tech business owner and consultant, expresses concerns about the increasing reliance on AI for code generation among tech CEOs. He notes that while AI tools can produce functional code, the output often results in overly complex and bloated codebases filled with excessive types, indirection, and unnecessary functions. This complexity complicates the process of code review and debugging, making it difficult for consultants to identify and resolve issues. The author draws a parallel to Java programming, where the availability of advanced tooling has led to similar problems of over-engineering. He emphasizes that while AI-generated code typically works, it is often poorly structured, leading to long-term maintenance challenges. The author seeks input from others in the tech community on how they are addressing these issues in their own codebases.

- AI-generated code can be overly complex and bloated.

- Increased indirection in code makes debugging more difficult.

- The author compares current issues to past challenges with Java programming.

- While AI code often functions, it may lead to long-term maintenance problems.

- The author invites discussion on how others are managing these challenges.

Ask HN: Will AI make us unemployed?

The author highlights reliance on AI tools like ChatGPT and GitHub Copilot, noting a 30% efficiency boost and concerns about potential job loss due to AI's increasing coding capabilities.

Up to 90% of my code is now generated by AI

A senior full-stack developer discusses the transformative impact of generative AI on programming, emphasizing the importance of creativity, continuous learning, and responsible integration of AI tools in coding practices.

Why Copilot Is Making Programmers Worse at Programming

AI-driven coding tools like Copilot may enhance productivity but risk eroding fundamental programming skills, fostering dependency, reducing learning opportunities, isolating developers, and creating a false sense of expertise.

AI: What people are saying

The comments reflect a range of perspectives on the challenges and strategies related to AI-generated code.

Many commenters express concerns about the complexity and maintainability of AI-generated code, likening it to legacy code issues.
Several users suggest using AI tools for code review and testing rather than code generation, emphasizing the importance of human oversight.
There is a consensus that while AI can assist in coding, it often produces bloated or suboptimal code that requires significant human intervention.
Some commenters highlight the competitive nature of software development, suggesting that AI will increase demand for skilled engineers to manage and refine AI-generated outputs.
Others advocate for better documentation and communication practices to mitigate the challenges posed by AI-generated code.

41 comments

By @derefr - 7 months

LLMs are always going to generate output (prose or code) in the same style as the training data. The Java training data corpus is both bloated/Enterprise-y, and mostly written by junior engineers or even students (because there are so many publicly-accessible Java school-project personal GitHub repos.)

If you're working on a Java project, consider prompting the AI to first write a "pseudocode solution" in a more concise/low boilerplate/"highly expressive" language — Ruby, for example — and then asking it to translate its own "pseudocode" into Java.

(Mind you, I'm not sure if you can modify the implicit system prompt used by automatic coding-assistance systems like Cursor. [Can you? Anyone know?] I'm more just thinking about how you'd do this if you were treating a ChatGPT-like chatbot as if it were StackOverflow — which is personally where I find most of the value in LLM-assisted coding.)

Alternately, consider prompting the AI to "act like a senior $lang engineer forced to write in Java for this project, who will attempt to retain their existing opinionated coding style they learned from decades of experience in $lang, in the Java code they write" — where $lang is either, once again, a more expressive language; or, more interestingly, a language with a community that skews away from junior engineers (i.e. a language that is rarely anyone's first programming language) and toward high-quality, well-engineered systems code rather than slop CRUD code. For example, Rust, or Erlang/Elixir.

(Funny enough, this is the exact same mental through-line that would lead a company to wanting to hire people with knowledge of these specific languages.)

By @al_borland - 7 months

> It's also hard to tell them not to use AI because the code does work.

If they are still looking to have humans maintain the code, and hire people to optimize it, the at use case needs to be taken into account. From that perspective, I would argue that it doesn’t work.

You could/should charge a higher rate for companies that have a bigger mess to clean up. A cleaning crew is going to charge more to clean up a hoard than to do some light cleaning and re-organizing. The price reflects the effort involved, and I don’t see this as any different. They can choose to pay to write clean code up front, or pay for it later with difficult or impossible maintenance. The choice is theirs.

Personally, everyone on my team hates AI and it’s rarely used. I probably use it the most, which is maybe 4 or 5 times a month to write a single line here or there, mostly to save me a search for syntax.

By @rockenman1234 - 7 months

I'm a undergrad studying CompE at a US University. I just got done with a group project, and I can confidently say that my opinion of AI generated code has changed a lot since I started. 3 of our 5 members had zero idea what their code did, they just had ChatGPT generate whatever they needed. The amount of bugs they introduced with each pull request was insane - this is definitely going to be a much bigger issue going forward as more IDE's start implementing LLM's in their UI.

I'm not going to act like I don't use ChatGPT, but I use it to understand things I don't know - like documenting code, helping fix bugs, etc. Never to just write code blindly, I'm just thankful I don't need to maintain the project once I'm done.

The only way I can see this being fixed moving forward is either a breakthrough in coding focused LLM's, or just banning AI generated code entierly (which I'm not even sure how to do).

By @thelostdragon - 7 months

Recently read a [similar article][1] and I couldn't agree more.

Essentially, you need to stringent the code review and testing processes. You cannot have AI doing code generation and code review and/or testing together. Either you have code generated by AI, and have Humans review and test the code, or have humans write the code and AI supporting with code review and/or testing the code. Never both.

I find the second approach better - AI supporting with testing and code review processes. Code generation is still Developer's domain and I don't see AI writing Production level code anytime soon. Developers can use AI to generate snippets of code which is then adjusted to make it production ready. That is probably best of both worlds at the moment.

  [1]: https://rdel.substack.com/p/rdel-57-what-are-the-most-common

By @moandcompany - 7 months

It sounds like there's going to be a market for specialized AI-models to do code reviews and refactoring.

Perhaps we will see new specialized adversarial models develop focused on minimizing slop output from regular generative models.

We'll need more chips and power plants.

By @agentultra - 7 months

Expertise and good verification automation tools: TLA+, Coq/Agda/Lean, specification synthesis, property-based testing, good old unit + functional testing, and elbow grease.

It won't be dissimilar to being a consultant now. I've spent years consulting where my job was to tame legacy code (that is, code which is not under test and nobody knows how it works). The value in roles like this is that you work at a level higher than the code in order to get the team able to understand the system and be able to start extending it again.

By @talldayo - 7 months

> So how are you all handling this problem?

Increasing my per-hour rate until I'm comfortable spending hundreds of hours unspaghettifying code.

By @klyrs - 7 months

The only thing to do is document the heck out of it with misleading comments, post it to github, and have other people link to the project, give it stars, interact with it. Continue reinforcing the machine's ego, pay it complements for its output. If we all do this enough, the GPTs will choke on the garbage and hasten the next AI winter. Do it for humanity.

By @anonzzzies - 7 months

This will get much, much worse. Better strap in.

By @throwup238 - 7 months

To me software seems like an arms race because while it’s expensive to develop, the marginal cost to run it is very low. As long as companies continue to find way to increase their competitiveness through software, there will always be more work for software engineers as more and more of them are consumed by maintenance of existing software.

That’s why the 2000s outsourcing wave didn’t really impact the market for software engineers in the US despite the apocalyptic rhetoric at the time. If Company A has a productive local dev team, their competitor B has to either hire their own or hire a bigger outsourced team to stay competitive. Then company A has to hire even more to meet exec’s growth targets and it becomes a competitive spiral unless economic conditions change (like recently with elimination of ZIRP).

I think in the near future, LLMs will put that arms race on overdrive and all of us will have to adapt. They’re here to stay, for better or worse.

I handle it by using AI tools for understanding codebases and code reviews. Properly set up, LLMs can even give decent refactoring advice - i.e. by feeding it a callgraph - but the UX isn’t there yet. I use cli apps and custom scripts to do a lot of codebase work with LLMs like claude-sync and aider.

I’d start by trying out aider/Cursor to see whether they can help you manage the code. Use aider’s repomap or Cursor’s repo indexing features and ask the LLM questions about the codebase. Use the Cursor code review feature, which has caught a lot of subtle bugs for me (I ignored the feature for my first month and was pleasantly surprised when I first used it in lieu of debugging a PR). Experiment with asking the LLM to refactor the codebase since now their token windows are large enough to fit ten thousand plus lines. Maybe try Github Copilot Chat if you can get beta access, since they do codebase indexing too.

My advice boils down to “fight fire with fire”. In the hands of competent engineers, LLMs are a lot more powerful than the slop you’re seeing. (For context I’m working at the intersection of C++/Rust/Qt for which training data is basically nonexistent)

By @meiraleal - 7 months

That's actually how the demand for software engineers will only grow with AI. CEOs will develop terrible but successful MVPs and will soon need to hire a competent professional to go the next mile

By @j45 - 7 months

I find the issue is there is too much or too little abstraction in people's prompts to do technical tasks.

There is proven value in first having CoT conversation to verify the scope, design and architecture, and only then begin to implement it.

At the same time, folk sare using Cursor with markdown files to maintain this information at all times (per prompt to be included), which significantly appears to increase the code quality and writing code how you want. Having a coding style instruction list is critical.

Java is an interesting beast though, it's got so much built in just a call away. No environments to mess with, no packages to really install, you get the whole universe available. For this, there are some better coding specific models, one of which is DeepSeek.

This model is almost exclusively coding right now, and the last I checked, their 2.5 model is outperforming Sonnet, and it's self-hostable.

I'd probably combine this with something like aider.chat to begin with.

There has been a bit more luck using something like aider.chat for coding things off the python/javascript/php trail.

The main thing I'm learning from coding youtube right now is there's a lot of ways that people code. Some think it through before, as they go, not at all and the results can vary, and there's no perfect way to do it.

Knowing what to ask, where, in which way is the secret that I've found. Often starting with Claude/GPT 4o on the web is better to get the inputs ready to put into a coding environment to begin.

By @swah - 7 months

I don't have this problem, but Cursor was an excellent companion yesterday while I was scripting in Python. It was always ready with function suggestions—sometimes just typing the function name and hitting tab twice. Other times, I used cmd+k for descriptions. What worries me is that writing functions is easy, but keeping everything in my head to write more functions later, especially after 20 days, is challenging.

(When I tried with Zig it was a disaster..)

By @iamwil - 7 months

When working with other engineers using IDEs like VSCode, while I was using vim, I ran into a similar dynamic. I relied on grep to find and trace code, whereas they were right-click go to definition. That ease of navigation made them insensitive to the amount of indirection in the code base that didn't need to be there.

It was an uphill battle, and honestly, for that time, I just switched to the editors they were using. Otherwise, I'd just be seen as being less productive.

For my own personal projects, I could spend more time just keeping things simple, and using vim. Though recently, I've switched to VScode based Cursor. I've found that it works best when I use it as a guide, rather than as a code writer. I'm often working on an atypical stack, and the code it generates just uses a wrong or outdated API. Even cursor is terrible at pulling in the right context and documentation. The mental habit of thinking it'll "just solve the problem" is wrong too, and wastes my time. It's much better if I think of it as a guide, and I ask it to explain everything I need to do step by step, so I'm following along, rather than expecting it to one-shot it all for me.

So, in some sense, you'll probably need to go with the flow. Using some sourcegraph or other tooling to help you navigate will help in the short term.

But I think in this sense, I agree with the handmade people--that keeping things short, simple, and fast will be a business advantage over time. But that's given that you've validated the need. And find companies that understand that AI won't be able to one-shot tasks (beyond very typical junior programming tasks) for a while to come.

By @rd42 - 7 months

AI should assist, not replace, skilled coding. It's crucial to blend AI efficiency with human craftsmanship for clean, sustainable code. From my consulting experience, excessive AI reliance results in bloated code that’s a nightmare to maintain. This echoes the problem Java shops faced with class and object overuse, but now it’s amplified.

By @amradio1989 - 7 months

You deal with it using AI. The difference is that you can provide better guidance than someone with inadequate development experience. Better guidance leads to a better (ie cleaner) result.

You might as well get good at this now, because the future is AI generated code and AI assisted code maintenance. And by the future, I mean the present.

By @add-sub-mul-div - 7 months

Maintaining bad AI code is the new maintaining bad offshore code.

By @dmitrygr - 7 months

> It's also hard to tell them not to use AI because the code does work.

It sometimes "works on their machine". As you aptly pointed out - it often has serious design issues.

> So how are you all handling this problem?

Do nothing. The problem will resolve itself. Eventually tech debt will crush them. Probably soon.

By @Joaomcabrita - 7 months

Have you tried using Agents that have system prompts that tell AI not just what you want to produce (java, python, typescript,...) but also include best practices to follow?

You can even give examples of what to do (simplify and stick to the objective) and what not to do (verbose and non-value add).

Although code never works the first time, has happened that AI suggested some nice abstraction and changed some function and variables naming to make it easier for me to remember next time...

I also prefer spoon-feeding AI and having a conversation about specific pieces of code to refactor, or features to add. That leads to smaller and more pragmatic use of AI vs the "build me an app that does XYZ in 2 seconds" that leads to code-dump mode.

By @hluska - 7 months

I’m having a lot of trouble following this. It seems to be related to Java while simultaneously having nothing to do with Java.

What actual problem are you running into?

If it’s purely that gen AI produces slop, that’s par for the course in consulting with startups. You will find a lot of humans willing to generate slop all on their own. Some will even do it for free. Fundamentally, if your client could figure it out themselves they wouldn’t be paying your hourly so slop represents part of the opportunity space.

LLMs will just create new opportunities in consulting. But the game is fundamentally the same - communicate about what you understand versus what you don’t, ask a lot of questions and underpromise in the beginning.

By @moomoo11 - 7 months

I just hope it leads to non-technical people not meddling in or starting tech startups where they have no f'ing idea what they're talking about or doing.

Stick to the scams and whatever else, or go work at a PE firm.

Source: my experience working with such types.

By @jorblumesea - 7 months

It's one of the biggest "problems" with LLM coding styles. It creates large blocks of code very fast, but the quality might be dubious and takes time to dig through. Same story with unit tests, e2e tests etc.

If it saves you time to fix it, it's still a win. If you find yourself spending more time debugging, just write it by hand. There's a lot of investors that want to "make AI happen" but the truth is more nuanced. If it works, use it. LLMs aren't good or bad, they are just another tool.

By @whatwaht - 7 months

We were all ranked together at the valuation. Men and women, old and young, married and single, were ranked with horses, sheep, and swine. 1. What is the meaning of the word "valuation" in this passage?

Responses

A Buying an item Buying an item

B Figuring out how much something is worth Figuring out how much something is worth

C Fixing a broken object Fixing a broken object

D Getting a good deal on a purchase

Getting a good deal on a purchase

By @YB27 - 7 months

Responses

A Buying an item Buying an item

B Figuring out how much something is worth Figuring out how much something is worth

C Fixing a broken object Fixing a broken object

D Getting a good deal on a purchase

Getting a good deal on a purchase

By @ddawww - 7 months

Responses

A Buying an item Buying an item

B Figuring out how much something is worth Figuring out how much something is worth

C Fixing a broken object Fixing a broken object

D Getting a good deal on a purchase

Getting a good deal on a purchase

By @senorrib - 7 months

This tale is older than genai. I've had this exact problem multiple times in my career, and it's usually very lucrative, especially if you're billing by the hour.

By @dijksterhuis - 7 months

i did this in a django shop where they had comparable class/code bloat/brokenness everywhere. rolled their own security. multiple race conditions. broken and anti-user frontend.

1. put together a big process flow diagram showing some of the dead code paths and ridiculous complexity and how it is causing bugs — show what they’ve done. show them the problem.

2. show a diagram of a comparable/desirable stack (ie fewer bugs/easier to change) — what you would have done. show them the solution.

3. put forward a longer term contract or something. basically charge more to rewrite the whole thing — what it’s going to cost them and how they need to commit to change.

step three is important. i would not move forward today if they don’t make a suitable commitment. hand wavy verbal statements are not good enough. getting something written on paper is key.

if they’re not willing to write it down then walk away — they haven’t had enough pain yet and aren’t willing to change their approach.

i did not get written down commitment from the django shop and i regret it. i am no longer there (and much happier).

By @Garlef - 7 months

I use chatgpt a lot for typescript. The code is good/okay but I usually ask for a round of refactoring and then do some more cleanup myself.

It definitely saves some time and reduces cognitive load for me.

But at the moment I'd not trust the output blindly.

I have not used LLMs to generate larger codebases.

I don't think the tech is there yet where you can get a bot to make a good contribution to a codebase on its own.

By @nurettin - 7 months

> It's also hard to tell them not to use AI because the code does work. I would say even most of the times the code does work.

> But it's just written in the worst possible manner and maintaining it long term is going to be so much harder if instead they had just handwritten the code.

Just charge more for LLM generated crappy code due to above reasons?

By @pfdietz - 7 months

Fixing up spaghetti code sounds like an excellent job for source-to-source transformation systems.

By @golergka - 7 months

Are you talking about single founders? It's completely OK for a non-tech person to start his idea with shitty code they wrote themselves and pay real professionals to untangle it later. AI didn't change that.

By @slowhadoken - 7 months

Don’t use an LLM to write code. That’s how I deal with it.

By @jagger27 - 7 months

Hold strong, keep noticing and articulating problems as they arise, build bridges to people willing to do it the hard way, and reward good behaviour.

By @iknownthing - 7 months

Is cursor the new co-pilot now?

By @m0n01d - 7 months

Get wha you pay for. Cheap ai isn’t good Good work ain’t cheap.

By @throwaway314155 - 7 months

> My theory is that Java is actually a pretty reasonable good language

Bold.

By @cpill - 7 months

sounds like terrible work and you should say "No thanks"

By @suninsight - 7 months

You dont have to deal with it. Only a question of time before better AI comes out and starts refactoring this to better looking code. These are just growing up pangs and one of those problems which should go away on their own.

The bigger problem is what to do of the humans who are fixing this sloppy code right now.

Ask HN: Will AI make us unemployed?

The author highlights reliance on AI tools like ChatGPT and GitHub Copilot, noting a 30% efficiency boost and concerns about potential job loss due to AI's increasing coding capabilities.

Ask HN: How to deal with AI generated sloppy code

Related

Ask HN: Will AI make us unemployed?

Up to 90% of my code is now generated by AI

Why Copilot Is Making Programmers Worse at Programming

Related

Ask HN: Will AI make us unemployed?

Up to 90% of my code is now generated by AI

Why Copilot Is Making Programmers Worse at Programming