January 14th, 2025

Ask HN: Teams using AI – how do you prevent it from breaking your codebase?

Teams using AI coding assistants face challenges with complex codebases, spending time correcting AI suggestions that disrupt coding patterns. Developing effective workflows is essential, especially for teams with mature codebases.

Ask HN: Teams using AI – how do you prevent it from breaking your codebase?

Teams utilizing AI coding assistants like Copilot, Cursor, and Windsurf are encountering challenges, particularly with complex codebases. Developers are increasingly spending time mitigating issues caused by AI suggestions that often overlook existing coding patterns, recreate already established components, or alter core architecture without understanding the consequences. Additionally, AI tools frequently forget essential context from prior interactions and require ongoing reminders about the team's technology stack decisions. This raises important questions for engineering teams: how often do they find AI making potentially harmful changes, how much time is dedicated to reviewing and correcting AI-generated suggestions, and what workflows have been implemented to keep AI on track? Insights are particularly sought from teams managing mature codebases, as their experiences may provide valuable lessons in effectively integrating AI tools into development processes.

- AI coding assistants can disrupt established coding patterns in complex codebases.

- Developers spend significant time correcting AI-generated code suggestions.

- Teams are encouraged to develop workflows to manage AI's contributions effectively.

- Insights from mature codebase teams are particularly valuable for understanding AI integration challenges.

Link Icon 33 comments
By @nzach - 4 months
> How often do you catch AI trying to make breaking changes?

I think this is the wrong way to think about the problem. The AI isn't breaking your code, the engineers are. AI is just a tool, and you should always keep this in mind when talking about this sort of thing.

But I understand some of the problems you describe while using AI. And the way I work around is to never let the AI write more than a few lines at a time, ideally it should just complete the line I started to type.

By @epolanski - 4 months
I generally use AI as an auto complete and rarely have it write features.

In general approach it expecting to explain everything and evaluate whether it makes more sense to use it or just do it yourself. Often the second option is much faster due to muscle memory and keyboard proficiency.

Even when it comes to boilerplate it will not respect the standards (unless you throw more files in the context) so you need to be specific. In cursor you can give more files to the context in all the chats I believe (except the simple one in the current editor window) and it will do a better job.

I think too many people treat AI as a junior coder that has been exposed to the business/practices and can give him short sentences and it will understand the task, but no, it's as good as detailed is your input (which is often not worth the hassle).

In cursor you can save prompt templates in composer by the way.

That being said there are situations where LLMs can severely outperform us. An example is maintenance of legacy code.

E.g. Cursor with Claude very good at is explaining you code, the less understandable it is, the more it shines. I don't know if you've noticed but LLMs can de-obfuscate obfuscated code with ease and can make poorly written code understandable rather quick.

I've entered an 800 lines of code function that computed the final price of configurations the other day (imagine configuring a car or other items in an e-commerce) and it was impossible to understand without a very huge multi-day effort. Too many business features got glued together all in a single giant body and I was able to both refactor it (find a better name for this, document that, explain this, refactor this block to a separate function, suggest improvements and so on).

Another great use case is learning new tools/languages. Didn't use Python for a decade and I quickly setup a Jupyter Notebook for financial simulations without having to really understand the language (you can simply ask it).

AI is not limited to what we can keep in mind at the same time (between 4 or 6 informations in our short-term memory) 800 lines of context all together is nothing and you can quickly iterate over such code.

Don't misplace your expectations about LLMs, they are a tool, it makes experienced engineers shine when used for the right purpose, but you have to understand the limitations as for any other tool out there.

By @secult - 4 months
IMO the biggest issue is that instead of reviewing the code of your colleagues, you review some random generated stuff. You know what kind of code you can expect from your colleagues, not anymore. Also you expect that code reviews promote knowledge and consistency amongst the team and helps them to become better in programming. Not anymore either.
By @anotherpaulg - 4 months
I use aider to work on the aider code base, which is approaching 30k lines of python. Aider writes about 70% of the new code in each release [0]. So it's doing some fairly heavy lifting in a non-trivial code base.

Some pragmatic tips:

- Work with AI like a junior developer, except with unlimited energy and no problem being corrected repeatedly.

- Provide the AI with guidance about conventions you expect in your code base, overall architecture, etc [1]. You can often use AI tools to help write the first draft of such "conventions" documents.

- Break the work down into self-contained bite sized steps. Don't ask AI to boil the ocean in one iteration. Ask it to make a sequence of changes that each move the code towards the goal.

- Be willing to explore for a few steps with the AI. If it's going sideways, undo/revert. Hopefully your AI tool has good checkpoint/undo/git support [2].

- Lint and test the code after each AI change. Hopefully your AI tool can automatically do that, and fix problems [3].

- If the AI is stuck, just code yourself until you get past the tricky part. Then resume AI coding when the going is easier.

- Build intuition for what AI tools are good at, use them when they're helpful. Code yourself when not.

Some things AI coding is very helpful for:

- Rough first draft of a change or feature. AI can often speed through a bunch of boilerplate. Then you can polish the final touches.

- Writing tests.

- Fixing fairly simple bugs.

- Efficiently solving problems with packages/libraries you may not have known about.

[0] https://aider.chat/HISTORY.html

[1] https://aider.chat/docs/usage/conventions.html

[2] https://aider.chat/docs/git.html

[3] https://aider.chat/docs/usage/lint-test.html

By @aerhardt - 4 months
There is an implication here that you're letting AI commit to your codebase without reviewing it first. Or that humans are committing AI generated code but without reviewing it first. Is that the case?

Because if it is, the answer seems fairly simple: you install a proper code review process, and you don't allow shit code into your codebase.

By @IMTDb - 4 months
AI does not commit code; people do. The origin of the code does not matter, processes remain the same. So:

1. We never catch AI trying to make breaking changes, but we catch developers who do. Since using AI tools we haven’t seen a huge change in those patterns.

2. Prior to opening a PR; developers are now spending more time reviewing code instead of writing it. During the code review process, we use AI to highlight potential issues faster.

3. Human in the middle

By @bradhe - 4 months
You’re asking the wrong questions. No one who has and idea what they’re doing has AI wholesale write new features of add new code in an unsupervised way.
By @kgilpin - 4 months
I wrote a SWE bench solver. The SWE bench issues are on mature projects like Django.

The objective of my solver was to get good solutions using only RAG (no embeddings) and with minimal cost (low token count).

Three techniques, combined, yielded good results. The first was to take a TDD approach, first generating a test and then requiring the LLM to pass the test (without failing others). It can also trace the test execution to see exactly what code participates in the feature.

The second technique was to separate “planning” from “coding”. The planner is freed from implementation details, and can worry more about figuring out which files to change, following existing code conventions, not duplicating code, etc. In the coding phase, the LLM is working from a predefined plan, and has little freedom to deviate. It just needs to create a working, lint-free implementation.

The third technique was a gentle pressure on the solver to make small changes in a minimum number of files (ideally, one).

AI coding tools today generally don’t incorporate any of this. They don’t favor TDD, they don’t have a bias towards making minimal changes, and they don’t work from a pre-approved design.

Good human developers do these things, and this is a pretty wide gap between adept human coders and AI.

By @awei - 4 months
I mostly copy and paste to and from AI Chats, have it sometimes write a "complete feature or class" but either read the diff between the generated code and the previous existing one thoroughly or test it and go back and forth with the LLM.
By @MarkusRabe - 4 months
IMO these are exactly the right observations given the current tools. Maybe give Augment a try (https://www.augmentcode.com/blog/augment-is-free-for-open-so...). Disclosure: I work there. We've tried hard to improve AI models in these dimensions, but certainly a long way to go. Feedback would be highly appreciated.
By @BeetleB - 4 months
Do you not have code reviews?

These are all problems with regular humans, too. You prevent it via code reviews.

By @PaulHoule - 4 months
I only use those IDE-integrated assistants on small greenfield projects I work for on my own account. Sometimes they work great, sometimes they don't. (The funniest thing about Windsurf is that it keeps forgetting it's supposed to write C:\some\path on windows when it uses its tools as opposed to /C:/some/path although it will do the right thing for a short time after I advise it)

At work we have AI policies that revolve around confidentiality and a contract to use Microsoft's Copilot so that is what I do. I use it as supplement to looking up answers in the manual. For instance I had to write some complicated Mockito tests and got sample code personalized to my needs, had it explain why it did certain things, how certain things work, etc. I've also had it give me good advice about how to deal with cases that I screwed up with git.

Often it gives me the insight to confirm things in the manual quickly, but my experience is that Copilot is weak in the citation department, often it gives 100% correct answers and justifies them with 100% wrong citations.

By @thurn - 4 months
If you've hired an intern before, think of Cursor like that. When you come up with an intern project plan, you usually need to give a very clear specification of what you expect, and you usually need to have the project be pretty self-contained. A lot of real problems are really bad intern projects, and a lot of problems are a really bad fit for AI. You have to be strategic about it.
By @f38zf5vdt - 4 months
I have dealt with some codebases that were purely assembled with ML and they tend towards being completely unmaintainable nightmares exhibiting all the worst tenets of things like object-oriented code design. Levels of inheritance 8 modules deep, global variables being passed around everywhere to escape them, the works.

For ML driven code development, I find it works best when I used it to make pure functions where I know exactly what I expect to go in and out of the function, and the LLM can simultaneously write the tests for it to ensure that it works. LLMs do not plan like humans, even when finetuned they seem to have difficulty being integrative with knowledge beyond pattern matching.

That being said, 90% of coding is pattern matching to something someone made already. And as long as I'm writing pure functions and providing suitably adequate context for what the model needs to produce, LLMs seem to work wonders. My rule of thumb is to spend 10-20 minutes specifying exactly what I need in the prompt, and then tuning that if I fail to get the expected result.

By @sssilver - 4 months
Small functions, small modules, small codebases. Keep state as contained as possible. Tightly control interfaces and interactions. Know your paradigms. Example: in Rust multi threading it loves putting things in an Arc. You have to tell it to use MPSC queues instead.

I love coding with AI. It has made me 100x more productive. I am able to work on my distributed event processing backend in Rust, then switch to my mobile app in Swift, then switch to my embedded device prototype and write a UART driver for a GPS module under ESP32.

I’ve been programming for many years but this level of productivity would have been unimaginable to me without AI.

By @btown - 4 months
I'm surprised that there aren't already greater integrations between Copilot and Github pull reviews - pulling into the prompt context for completions, for instance, the data set of pull review comments that address any prior attempt to break patterns and modify architecture. Any mature codebase will have had many a senior engineer swatting away these attempts, in writing. Either way, this seems to me to be a context engineering challenge (and privacy-at-scale challenge) more than anything else, and that means that this will all improve over time!
By @notJim - 4 months
Is the issue that the suggestions from the AI tool are not good, or is that bad code is making it into the repo?

The latter problem should be prevent by code review (first by the developer using the AI tool and then their teammates on a PR.) Code generated by AI should be reviewed no differently than code written by a human. If you wouldn't approve the PR if a person wrote this code, why would you approve it because an LLM wrote it? If your PR process is not catching these issues, you have a PR process problem, not an AI problem.

The former problem I have no idea.

By @alecfong - 4 months
You write tests, lint, and review the code. Provide clarifying instructions and point to places in code base to improve agent understanding. It’s funny how similar it is to standard engineering!
By @elif - 4 months
If you're going to enable a coding agent with this much authority, you could, for instance, have two layers of agents which review the changes in context of the project as you described, providing the dev agent the appropriate feedback to fix its own errors.

Have all that closed loop before anything makes it to a pull request that a human sees.

Add an agent to write unit tests for all impacted modules, etc.

Essentially coding is just coding, something has to also do the software engineering.

By @alyandon - 4 months
I've been testing LLM dev tooling and I'm still on the fence in general about whether or not the generated code is a net productivity gain for me. I still have to take the time to review the code, fix subtle bugs/corner cases and make changes to reflect the current coding standards of the existing code base. In many respects, it's like handing off a programming task to a somewhat competent intern.
By @SrslyJosh - 4 months
> Some common scenarios I keep running into:

> * AI suggests code that completely ignores existing patterns

> * It recreates components we already have

> * It modifies core architecture without understanding implications

> * It forgets critical context from previous conversations

> * It needs constant reminders about our tech stack decisions

Sounds like a really useful tool!

Serious question: Have you considered just, y'know, learning how to program on your own?

By @asjir - 4 months
Actually, only today I noticed that Cursor's Agent Composer now can properly respond to linting errors.

In my typescript codebase case, it solves a lot of problems, it probably helps that I use tRPC type rather aggressively (i.e. using UUID types, separate dates and datetimes etc).

A few weeks ago it wasn't working nearly as well in my experience.

By @the_af - 4 months
I've thought about this and one thought is:

How does AI making breaking changes or not following established patterns differ from human developers (possibly novices) doing the same?

Which safeguards do you have against human developers randomly copying code from StackOverflow, and why aren't they enough against developers using AI-generated code?

By @jdale27 - 4 months
Same way you prevent humans from breaking your codebase: tests, code reviews, static analysis, etc. I agree with others that say treat AI like an intern or junior dev. They can speed you up if managed well, but need oversight. Over time the AI will get better, and will learn to learn as senior developers do.
By @manishsharan - 4 months
1. Don't use Devin or OpenDevin or whatever

2. Use LLM for code auto complete or ask LLM to code specific functions that you weave together to deliver a feature.

3. Or use it to explain your code. I concatenate all my code and shove it into Gemini and ask it to explain how the legacy stuff works

By @HanClinto - 4 months
I don't use AI tools quite at this level, but from the common scenarios that you list, some of these remind me of the sorts of things that teams run into when they (blindly) add new and enthusiastic developers to a project.

In those cases, one common tool to help mitigate those issues is a somewhat tedious (but very helpful) exercise of establishing a Team Charter.

I wonder if a similar sort of thing would be useful to load into the base prompt / context of every AI-generated code request? We ask our new developers to always develop with our Team Charter in mind -- why not ask Copilot to do the same?

It wouldn't address everything you listed, but I wonder if it would help.

Do you have a Team Charter and coding standards doc already written out? If not, I wonder if it could help to ask a Copilot-type tool to analyze a codebase and establish a coding standards / charter-type document first, and then back-feed that into the system.

By @xyzzy4747 - 4 months
In my experience so far with AI, it's primarily good for creating boilerplate pages and APIs and for helping with auto-complete suggestions.

I think eventually AI agents will really take off but not sure if anything works well there yet.

By @BrentOzar - 4 months
It sounds like you're not training it with your existing code base, and that you're running it with relatively small contexts. Have you done any custom LLM training on your code base, and what model are you using?
By @ripped_britches - 4 months
I folders of docs that I always @docs to give cursor the context for how/why things are done. This goes a very long way, but your docs and code need to be colocated. And it helps to make docs concise.
By @383toast - 4 months
Just add careful documentation and changelog to all modules
By @mirekrusin - 4 months
List sounds like quite few people I worked with.
By @platypii - 4 months
AI is even better at writing tests than writing code. So have it write the tests first and then write the code.