Ask HN: Teams using AI – how do you prevent it from breaking your codebase?
Teams using AI coding assistants face challenges with complex codebases, spending time correcting AI suggestions that disrupt coding patterns. Developing effective workflows is essential, especially for teams with mature codebases.
Teams utilizing AI coding assistants like Copilot, Cursor, and Windsurf are encountering challenges, particularly with complex codebases. Developers are increasingly spending time mitigating issues caused by AI suggestions that often overlook existing coding patterns, recreate already established components, or alter core architecture without understanding the consequences. Additionally, AI tools frequently forget essential context from prior interactions and require ongoing reminders about the team's technology stack decisions. This raises important questions for engineering teams: how often do they find AI making potentially harmful changes, how much time is dedicated to reviewing and correcting AI-generated suggestions, and what workflows have been implemented to keep AI on track? Insights are particularly sought from teams managing mature codebases, as their experiences may provide valuable lessons in effectively integrating AI tools into development processes.
- AI coding assistants can disrupt established coding patterns in complex codebases.
- Developers spend significant time correcting AI-generated code suggestions.
- Teams are encouraged to develop workflows to manage AI's contributions effectively.
- Insights from mature codebase teams are particularly valuable for understanding AI integration challenges.
Related
Why Copilot Is Making Programmers Worse at Programming
AI-driven coding tools like Copilot may enhance productivity but risk eroding fundamental programming skills, fostering dependency, reducing learning opportunities, isolating developers, and creating a false sense of expertise.
Are Devs Becoming Lazy? The Rise of AI and the Decline of Care
The rise of AI tools like GitHub Copilot enhances productivity but raises concerns about developer complacency and skill decline, emphasizing the need for critical evaluation and ongoing skill maintenance.
The Copilot Pause
The "Copilot Pause" reveals developers' mental blocks when relying on AI tools, leading to poor coding practices. Periodic disconnection from AI is encouraged to enhance problem-solving skills and technical mastery.
The 70% problem: Hard truths about AI-assisted coding
AI-assisted coding increases developer productivity but does not improve software quality significantly. Experienced developers benefit more, while novices risk creating fragile systems without proper oversight and expertise.
AI-assisted coding will change software engineering: hard truths
AI-assisted coding is widely adopted among developers, enhancing productivity but requiring human expertise. Experienced engineers benefit more than beginners, facing challenges in completing projects and understanding AI-generated code.
I think this is the wrong way to think about the problem. The AI isn't breaking your code, the engineers are. AI is just a tool, and you should always keep this in mind when talking about this sort of thing.
But I understand some of the problems you describe while using AI. And the way I work around is to never let the AI write more than a few lines at a time, ideally it should just complete the line I started to type.
In general approach it expecting to explain everything and evaluate whether it makes more sense to use it or just do it yourself. Often the second option is much faster due to muscle memory and keyboard proficiency.
Even when it comes to boilerplate it will not respect the standards (unless you throw more files in the context) so you need to be specific. In cursor you can give more files to the context in all the chats I believe (except the simple one in the current editor window) and it will do a better job.
I think too many people treat AI as a junior coder that has been exposed to the business/practices and can give him short sentences and it will understand the task, but no, it's as good as detailed is your input (which is often not worth the hassle).
In cursor you can save prompt templates in composer by the way.
That being said there are situations where LLMs can severely outperform us. An example is maintenance of legacy code.
E.g. Cursor with Claude very good at is explaining you code, the less understandable it is, the more it shines. I don't know if you've noticed but LLMs can de-obfuscate obfuscated code with ease and can make poorly written code understandable rather quick.
I've entered an 800 lines of code function that computed the final price of configurations the other day (imagine configuring a car or other items in an e-commerce) and it was impossible to understand without a very huge multi-day effort. Too many business features got glued together all in a single giant body and I was able to both refactor it (find a better name for this, document that, explain this, refactor this block to a separate function, suggest improvements and so on).
Another great use case is learning new tools/languages. Didn't use Python for a decade and I quickly setup a Jupyter Notebook for financial simulations without having to really understand the language (you can simply ask it).
AI is not limited to what we can keep in mind at the same time (between 4 or 6 informations in our short-term memory) 800 lines of context all together is nothing and you can quickly iterate over such code.
Don't misplace your expectations about LLMs, they are a tool, it makes experienced engineers shine when used for the right purpose, but you have to understand the limitations as for any other tool out there.
Some pragmatic tips:
- Work with AI like a junior developer, except with unlimited energy and no problem being corrected repeatedly.
- Provide the AI with guidance about conventions you expect in your code base, overall architecture, etc [1]. You can often use AI tools to help write the first draft of such "conventions" documents.
- Break the work down into self-contained bite sized steps. Don't ask AI to boil the ocean in one iteration. Ask it to make a sequence of changes that each move the code towards the goal.
- Be willing to explore for a few steps with the AI. If it's going sideways, undo/revert. Hopefully your AI tool has good checkpoint/undo/git support [2].
- Lint and test the code after each AI change. Hopefully your AI tool can automatically do that, and fix problems [3].
- If the AI is stuck, just code yourself until you get past the tricky part. Then resume AI coding when the going is easier.
- Build intuition for what AI tools are good at, use them when they're helpful. Code yourself when not.
Some things AI coding is very helpful for:
- Rough first draft of a change or feature. AI can often speed through a bunch of boilerplate. Then you can polish the final touches.
- Writing tests.
- Fixing fairly simple bugs.
- Efficiently solving problems with packages/libraries you may not have known about.
[0] https://aider.chat/HISTORY.html
[1] https://aider.chat/docs/usage/conventions.html
Because if it is, the answer seems fairly simple: you install a proper code review process, and you don't allow shit code into your codebase.
1. We never catch AI trying to make breaking changes, but we catch developers who do. Since using AI tools we haven’t seen a huge change in those patterns.
2. Prior to opening a PR; developers are now spending more time reviewing code instead of writing it. During the code review process, we use AI to highlight potential issues faster.
3. Human in the middle
The objective of my solver was to get good solutions using only RAG (no embeddings) and with minimal cost (low token count).
Three techniques, combined, yielded good results. The first was to take a TDD approach, first generating a test and then requiring the LLM to pass the test (without failing others). It can also trace the test execution to see exactly what code participates in the feature.
The second technique was to separate “planning” from “coding”. The planner is freed from implementation details, and can worry more about figuring out which files to change, following existing code conventions, not duplicating code, etc. In the coding phase, the LLM is working from a predefined plan, and has little freedom to deviate. It just needs to create a working, lint-free implementation.
The third technique was a gentle pressure on the solver to make small changes in a minimum number of files (ideally, one).
AI coding tools today generally don’t incorporate any of this. They don’t favor TDD, they don’t have a bias towards making minimal changes, and they don’t work from a pre-approved design.
Good human developers do these things, and this is a pretty wide gap between adept human coders and AI.
These are all problems with regular humans, too. You prevent it via code reviews.
At work we have AI policies that revolve around confidentiality and a contract to use Microsoft's Copilot so that is what I do. I use it as supplement to looking up answers in the manual. For instance I had to write some complicated Mockito tests and got sample code personalized to my needs, had it explain why it did certain things, how certain things work, etc. I've also had it give me good advice about how to deal with cases that I screwed up with git.
Often it gives me the insight to confirm things in the manual quickly, but my experience is that Copilot is weak in the citation department, often it gives 100% correct answers and justifies them with 100% wrong citations.
For ML driven code development, I find it works best when I used it to make pure functions where I know exactly what I expect to go in and out of the function, and the LLM can simultaneously write the tests for it to ensure that it works. LLMs do not plan like humans, even when finetuned they seem to have difficulty being integrative with knowledge beyond pattern matching.
That being said, 90% of coding is pattern matching to something someone made already. And as long as I'm writing pure functions and providing suitably adequate context for what the model needs to produce, LLMs seem to work wonders. My rule of thumb is to spend 10-20 minutes specifying exactly what I need in the prompt, and then tuning that if I fail to get the expected result.
I love coding with AI. It has made me 100x more productive. I am able to work on my distributed event processing backend in Rust, then switch to my mobile app in Swift, then switch to my embedded device prototype and write a UART driver for a GPS module under ESP32.
I’ve been programming for many years but this level of productivity would have been unimaginable to me without AI.
The latter problem should be prevent by code review (first by the developer using the AI tool and then their teammates on a PR.) Code generated by AI should be reviewed no differently than code written by a human. If you wouldn't approve the PR if a person wrote this code, why would you approve it because an LLM wrote it? If your PR process is not catching these issues, you have a PR process problem, not an AI problem.
The former problem I have no idea.
Have all that closed loop before anything makes it to a pull request that a human sees.
Add an agent to write unit tests for all impacted modules, etc.
Essentially coding is just coding, something has to also do the software engineering.
> * AI suggests code that completely ignores existing patterns
> * It recreates components we already have
> * It modifies core architecture without understanding implications
> * It forgets critical context from previous conversations
> * It needs constant reminders about our tech stack decisions
Sounds like a really useful tool!
Serious question: Have you considered just, y'know, learning how to program on your own?
In my typescript codebase case, it solves a lot of problems, it probably helps that I use tRPC type rather aggressively (i.e. using UUID types, separate dates and datetimes etc).
A few weeks ago it wasn't working nearly as well in my experience.
How does AI making breaking changes or not following established patterns differ from human developers (possibly novices) doing the same?
Which safeguards do you have against human developers randomly copying code from StackOverflow, and why aren't they enough against developers using AI-generated code?
2. Use LLM for code auto complete or ask LLM to code specific functions that you weave together to deliver a feature.
3. Or use it to explain your code. I concatenate all my code and shove it into Gemini and ask it to explain how the legacy stuff works
In those cases, one common tool to help mitigate those issues is a somewhat tedious (but very helpful) exercise of establishing a Team Charter.
I wonder if a similar sort of thing would be useful to load into the base prompt / context of every AI-generated code request? We ask our new developers to always develop with our Team Charter in mind -- why not ask Copilot to do the same?
It wouldn't address everything you listed, but I wonder if it would help.
Do you have a Team Charter and coding standards doc already written out? If not, I wonder if it could help to ask a Copilot-type tool to analyze a codebase and establish a coding standards / charter-type document first, and then back-feed that into the system.
I think eventually AI agents will really take off but not sure if anything works well there yet.
Related
Why Copilot Is Making Programmers Worse at Programming
AI-driven coding tools like Copilot may enhance productivity but risk eroding fundamental programming skills, fostering dependency, reducing learning opportunities, isolating developers, and creating a false sense of expertise.
Are Devs Becoming Lazy? The Rise of AI and the Decline of Care
The rise of AI tools like GitHub Copilot enhances productivity but raises concerns about developer complacency and skill decline, emphasizing the need for critical evaluation and ongoing skill maintenance.
The Copilot Pause
The "Copilot Pause" reveals developers' mental blocks when relying on AI tools, leading to poor coding practices. Periodic disconnection from AI is encouraged to enhance problem-solving skills and technical mastery.
The 70% problem: Hard truths about AI-assisted coding
AI-assisted coding increases developer productivity but does not improve software quality significantly. Experienced developers benefit more, while novices risk creating fragile systems without proper oversight and expertise.
AI-assisted coding will change software engineering: hard truths
AI-assisted coding is widely adopted among developers, enhancing productivity but requiring human expertise. Experienced engineers benefit more than beginners, facing challenges in completing projects and understanding AI-generated code.