Ask HN: How to deal with AI generated sloppy code
The author raises concerns about AI-generated code being overly complex and bloated, complicating debugging and maintenance, and invites the tech community to share their strategies for managing these issues.
The author, a tech business owner and consultant, expresses concerns about the increasing reliance on AI for code generation among tech CEOs. He notes that while AI tools can produce functional code, the output often results in overly complex and bloated codebases filled with excessive types, indirection, and unnecessary functions. This complexity complicates the process of code review and debugging, making it difficult for consultants to identify and resolve issues. The author draws a parallel to Java programming, where the availability of advanced tooling has led to similar problems of over-engineering. He emphasizes that while AI-generated code typically works, it is often poorly structured, leading to long-term maintenance challenges. The author seeks input from others in the tech community on how they are addressing these issues in their own codebases.
- AI-generated code can be overly complex and bloated.
- Increased indirection in code makes debugging more difficult.
- The author compares current issues to past challenges with Java programming.
- While AI code often functions, it may lead to long-term maintenance problems.
- The author invites discussion on how others are managing these challenges.
Related
Ask HN: Will AI make us unemployed?
The author highlights reliance on AI tools like ChatGPT and GitHub Copilot, noting a 30% efficiency boost and concerns about potential job loss due to AI's increasing coding capabilities.
Up to 90% of my code is now generated by AI
A senior full-stack developer discusses the transformative impact of generative AI on programming, emphasizing the importance of creativity, continuous learning, and responsible integration of AI tools in coding practices.
Why Copilot Is Making Programmers Worse at Programming
AI-driven coding tools like Copilot may enhance productivity but risk eroding fundamental programming skills, fostering dependency, reducing learning opportunities, isolating developers, and creating a false sense of expertise.
- Many commenters express concerns about the complexity and maintainability of AI-generated code, likening it to legacy code issues.
- Several users suggest using AI tools for code review and testing rather than code generation, emphasizing the importance of human oversight.
- There is a consensus that while AI can assist in coding, it often produces bloated or suboptimal code that requires significant human intervention.
- Some commenters highlight the competitive nature of software development, suggesting that AI will increase demand for skilled engineers to manage and refine AI-generated outputs.
- Others advocate for better documentation and communication practices to mitigate the challenges posed by AI-generated code.
If you're working on a Java project, consider prompting the AI to first write a "pseudocode solution" in a more concise/low boilerplate/"highly expressive" language — Ruby, for example — and then asking it to translate its own "pseudocode" into Java.
(Mind you, I'm not sure if you can modify the implicit system prompt used by automatic coding-assistance systems like Cursor. [Can you? Anyone know?] I'm more just thinking about how you'd do this if you were treating a ChatGPT-like chatbot as if it were StackOverflow — which is personally where I find most of the value in LLM-assisted coding.)
Alternately, consider prompting the AI to "act like a senior $lang engineer forced to write in Java for this project, who will attempt to retain their existing opinionated coding style they learned from decades of experience in $lang, in the Java code they write" — where $lang is either, once again, a more expressive language; or, more interestingly, a language with a community that skews away from junior engineers (i.e. a language that is rarely anyone's first programming language) and toward high-quality, well-engineered systems code rather than slop CRUD code. For example, Rust, or Erlang/Elixir.
(Funny enough, this is the exact same mental through-line that would lead a company to wanting to hire people with knowledge of these specific languages.)
If they are still looking to have humans maintain the code, and hire people to optimize it, the at use case needs to be taken into account. From that perspective, I would argue that it doesn’t work.
You could/should charge a higher rate for companies that have a bigger mess to clean up. A cleaning crew is going to charge more to clean up a hoard than to do some light cleaning and re-organizing. The price reflects the effort involved, and I don’t see this as any different. They can choose to pay to write clean code up front, or pay for it later with difficult or impossible maintenance. The choice is theirs.
Personally, everyone on my team hates AI and it’s rarely used. I probably use it the most, which is maybe 4 or 5 times a month to write a single line here or there, mostly to save me a search for syntax.
I'm not going to act like I don't use ChatGPT, but I use it to understand things I don't know - like documenting code, helping fix bugs, etc. Never to just write code blindly, I'm just thankful I don't need to maintain the project once I'm done.
The only way I can see this being fixed moving forward is either a breakthrough in coding focused LLM's, or just banning AI generated code entierly (which I'm not even sure how to do).
Essentially, you need to stringent the code review and testing processes. You cannot have AI doing code generation and code review and/or testing together. Either you have code generated by AI, and have Humans review and test the code, or have humans write the code and AI supporting with code review and/or testing the code. Never both.
I find the second approach better - AI supporting with testing and code review processes. Code generation is still Developer's domain and I don't see AI writing Production level code anytime soon. Developers can use AI to generate snippets of code which is then adjusted to make it production ready. That is probably best of both worlds at the moment.
[1]: https://rdel.substack.com/p/rdel-57-what-are-the-most-common
Perhaps we will see new specialized adversarial models develop focused on minimizing slop output from regular generative models.
We'll need more chips and power plants.
It won't be dissimilar to being a consultant now. I've spent years consulting where my job was to tame legacy code (that is, code which is not under test and nobody knows how it works). The value in roles like this is that you work at a level higher than the code in order to get the team able to understand the system and be able to start extending it again.
Increasing my per-hour rate until I'm comfortable spending hundreds of hours unspaghettifying code.
That’s why the 2000s outsourcing wave didn’t really impact the market for software engineers in the US despite the apocalyptic rhetoric at the time. If Company A has a productive local dev team, their competitor B has to either hire their own or hire a bigger outsourced team to stay competitive. Then company A has to hire even more to meet exec’s growth targets and it becomes a competitive spiral unless economic conditions change (like recently with elimination of ZIRP).
I think in the near future, LLMs will put that arms race on overdrive and all of us will have to adapt. They’re here to stay, for better or worse.
I handle it by using AI tools for understanding codebases and code reviews. Properly set up, LLMs can even give decent refactoring advice - i.e. by feeding it a callgraph - but the UX isn’t there yet. I use cli apps and custom scripts to do a lot of codebase work with LLMs like claude-sync and aider.
I’d start by trying out aider/Cursor to see whether they can help you manage the code. Use aider’s repomap or Cursor’s repo indexing features and ask the LLM questions about the codebase. Use the Cursor code review feature, which has caught a lot of subtle bugs for me (I ignored the feature for my first month and was pleasantly surprised when I first used it in lieu of debugging a PR). Experiment with asking the LLM to refactor the codebase since now their token windows are large enough to fit ten thousand plus lines. Maybe try Github Copilot Chat if you can get beta access, since they do codebase indexing too.
My advice boils down to “fight fire with fire”. In the hands of competent engineers, LLMs are a lot more powerful than the slop you’re seeing. (For context I’m working at the intersection of C++/Rust/Qt for which training data is basically nonexistent)
There is proven value in first having CoT conversation to verify the scope, design and architecture, and only then begin to implement it.
At the same time, folk sare using Cursor with markdown files to maintain this information at all times (per prompt to be included), which significantly appears to increase the code quality and writing code how you want. Having a coding style instruction list is critical.
Java is an interesting beast though, it's got so much built in just a call away. No environments to mess with, no packages to really install, you get the whole universe available. For this, there are some better coding specific models, one of which is DeepSeek.
This model is almost exclusively coding right now, and the last I checked, their 2.5 model is outperforming Sonnet, and it's self-hostable.
I'd probably combine this with something like aider.chat to begin with.
There has been a bit more luck using something like aider.chat for coding things off the python/javascript/php trail.
The main thing I'm learning from coding youtube right now is there's a lot of ways that people code. Some think it through before, as they go, not at all and the results can vary, and there's no perfect way to do it.
Knowing what to ask, where, in which way is the secret that I've found. Often starting with Claude/GPT 4o on the web is better to get the inputs ready to put into a coding environment to begin.
(When I tried with Zig it was a disaster..)
It was an uphill battle, and honestly, for that time, I just switched to the editors they were using. Otherwise, I'd just be seen as being less productive.
For my own personal projects, I could spend more time just keeping things simple, and using vim. Though recently, I've switched to VScode based Cursor. I've found that it works best when I use it as a guide, rather than as a code writer. I'm often working on an atypical stack, and the code it generates just uses a wrong or outdated API. Even cursor is terrible at pulling in the right context and documentation. The mental habit of thinking it'll "just solve the problem" is wrong too, and wastes my time. It's much better if I think of it as a guide, and I ask it to explain everything I need to do step by step, so I'm following along, rather than expecting it to one-shot it all for me.
So, in some sense, you'll probably need to go with the flow. Using some sourcegraph or other tooling to help you navigate will help in the short term.
But I think in this sense, I agree with the handmade people--that keeping things short, simple, and fast will be a business advantage over time. But that's given that you've validated the need. And find companies that understand that AI won't be able to one-shot tasks (beyond very typical junior programming tasks) for a while to come.
You might as well get good at this now, because the future is AI generated code and AI assisted code maintenance. And by the future, I mean the present.
It sometimes "works on their machine". As you aptly pointed out - it often has serious design issues.
> So how are you all handling this problem?
Do nothing. The problem will resolve itself. Eventually tech debt will crush them. Probably soon.
You can even give examples of what to do (simplify and stick to the objective) and what not to do (verbose and non-value add).
Although code never works the first time, has happened that AI suggested some nice abstraction and changed some function and variables naming to make it easier for me to remember next time...
I also prefer spoon-feeding AI and having a conversation about specific pieces of code to refactor, or features to add. That leads to smaller and more pragmatic use of AI vs the "build me an app that does XYZ in 2 seconds" that leads to code-dump mode.
What actual problem are you running into?
If it’s purely that gen AI produces slop, that’s par for the course in consulting with startups. You will find a lot of humans willing to generate slop all on their own. Some will even do it for free. Fundamentally, if your client could figure it out themselves they wouldn’t be paying your hourly so slop represents part of the opportunity space.
LLMs will just create new opportunities in consulting. But the game is fundamentally the same - communicate about what you understand versus what you don’t, ask a lot of questions and underpromise in the beginning.
Stick to the scams and whatever else, or go work at a PE firm.
Source: my experience working with such types.
If it saves you time to fix it, it's still a win. If you find yourself spending more time debugging, just write it by hand. There's a lot of investors that want to "make AI happen" but the truth is more nuanced. If it works, use it. LLMs aren't good or bad, they are just another tool.
Responses
A Buying an item Buying an item
B Figuring out how much something is worth Figuring out how much something is worth
C Fixing a broken object Fixing a broken object
D Getting a good deal on a purchase
Getting a good deal on a purchase
Responses
A Buying an item Buying an item
B Figuring out how much something is worth Figuring out how much something is worth
C Fixing a broken object Fixing a broken object
D Getting a good deal on a purchase
Getting a good deal on a purchase
Responses
A Buying an item Buying an item
B Figuring out how much something is worth Figuring out how much something is worth
C Fixing a broken object Fixing a broken object
D Getting a good deal on a purchase
Getting a good deal on a purchase
1. put together a big process flow diagram showing some of the dead code paths and ridiculous complexity and how it is causing bugs — show what they’ve done. show them the problem.
2. show a diagram of a comparable/desirable stack (ie fewer bugs/easier to change) — what you would have done. show them the solution.
3. put forward a longer term contract or something. basically charge more to rewrite the whole thing — what it’s going to cost them and how they need to commit to change.
-
step three is important. i would not move forward today if they don’t make a suitable commitment. hand wavy verbal statements are not good enough. getting something written on paper is key.
if they’re not willing to write it down then walk away — they haven’t had enough pain yet and aren’t willing to change their approach.
i did not get written down commitment from the django shop and i regret it. i am no longer there (and much happier).
It definitely saves some time and reduces cognitive load for me.
But at the moment I'd not trust the output blindly.
I have not used LLMs to generate larger codebases.
I don't think the tech is there yet where you can get a bot to make a good contribution to a codebase on its own.
> But it's just written in the worst possible manner and maintaining it long term is going to be so much harder if instead they had just handwritten the code.
Just charge more for LLM generated crappy code due to above reasons?
Bold.
The bigger problem is what to do of the humans who are fixing this sloppy code right now.
Related
Ask HN: Will AI make us unemployed?
The author highlights reliance on AI tools like ChatGPT and GitHub Copilot, noting a 30% efficiency boost and concerns about potential job loss due to AI's increasing coding capabilities.
Up to 90% of my code is now generated by AI
A senior full-stack developer discusses the transformative impact of generative AI on programming, emphasizing the importance of creativity, continuous learning, and responsible integration of AI tools in coding practices.
Why Copilot Is Making Programmers Worse at Programming
AI-driven coding tools like Copilot may enhance productivity but risk eroding fundamental programming skills, fostering dependency, reducing learning opportunities, isolating developers, and creating a false sense of expertise.