The 70% problem: Hard truths about AI-assisted coding
AI-assisted coding increases developer productivity but does not improve software quality significantly. Experienced developers benefit more, while novices risk creating fragile systems without proper oversight and expertise.
Read original articleAI-assisted coding has shown a significant productivity boost for developers, yet the quality of software produced has not improved proportionately. This phenomenon, termed the "70% problem," highlights that while AI tools can quickly generate prototypes, they often leave a critical 30% of work that requires human expertise to ensure maintainability and robustness. Developers can be categorized into two groups: "bootstrappers," who use AI to create initial prototypes rapidly, and "iterators," who integrate AI into their daily coding tasks. However, the reliance on AI can lead to issues, especially for less experienced developers, who may accept AI-generated code without the necessary scrutiny, resulting in fragile systems. The article emphasizes that AI tools are more beneficial for seasoned developers who can guide and refine AI outputs, while novices may struggle without foundational knowledge. The future of AI in software development is seen as a collaborative relationship where AI acts as a supportive tool rather than a replacement for human judgment. As AI tools evolve, they may become more autonomous, but the need for human oversight and expertise will remain crucial to ensure high-quality software development.
- AI tools boost productivity but do not significantly enhance software quality.
- Experienced developers benefit more from AI, while juniors may produce fragile code.
- The "70% problem" indicates that AI can generate prototypes but struggles with the final refinements.
- Future AI tools may evolve into more autonomous collaborators, requiring human guidance.
- Maintaining engineering standards is essential for producing robust software despite AI assistance.
Related
One of the best ways to get value for AI coding tools: generating tests
The 2024 Developer Survey indicates that programmers are increasingly using AI tools for testing, aiming to improve code quality and reduce tedious tasks, enhancing overall software reliability and efficiency.
Are Devs Becoming Lazy? The Rise of AI and the Decline of Care
The rise of AI tools like GitHub Copilot enhances productivity but raises concerns about developer complacency and skill decline, emphasizing the need for critical evaluation and ongoing skill maintenance.
Packages were supposed to replace programming. They got you 70% of the way there as well.
Same with 4GLs, Visual Coding, CASE tools, even Rails and the rest of the opinionated web tools.
Every generation has to learn “There is no silver bullet”.
Even though Fred Brooks explained why in 1986. There are essential tasks and there are accidental tasks. The tools really only help with the accidental tasks.
AI is a fabulous tool that is way more flexible than previous attempts because I can just talk to it in English and it covers every accidental issue you can imagine. But it can’t do the essential work of complexity management for the same reason it can’t prove an unproven maths problem.
As it stands we still need human brains to do those things.
That's a perfect summary, in my opinion. Both junior devs and AI tools tend to write buggy and overly verbose code. In both cases, you have to carefully review their code before merging, which takes time away from all the senior members of the team. But for a dedicated and loyal coworker, I'm willing to sacrifice some of my productivity to help them grow, because I know they'll help me back in the future. But current AI tools cannot learn from feedback. That means with AI, I'll be reviewing the exact same beginner's mistakes every time.
And that means time spent on proofreading AI output is mostly wasted.
Firstly there is the double edged sword of AI when learning. The easy path is to use it as a way to shortcut learning, to get the juice without the pressing, skipping the discomfort of not knowing how to do something. But that's obviously skipping the learning too. The discomfort is necessary. On the flip side, if one uses an llm as a mentor who has all the time in the world for you, you can converse with it to get a deeper understanding, to get feedback, to unearth unknown unknowns etc. So there is an opportunity for the wise and motivated to get accelerated learning if they can avoid the temptation of a crutch.
The less tractable problem is hiring. Why does a company hire junior devs? Because there is a certain proportion of work which doesn't take as much experience and would waste more senior developers time. If AI takes away the lower skill tasks previously assigned to juniors, companies will be less inclined to pay for them.
Of course if nobody invests in juniors, where will the mid and senior developers of tomorrow come from? But that's a tragedy of the commons situation, few companies will wish to invest in developers who are likely to move on before they reap the rewards.
To add on to this point, there's a huge role of validation tools in the workflow.
If AI written rust code compiles and the test cases pass, it's a huge positive signal for me, because of how strict rust compiler is.
One example I can share is
https://github.com/rusiaaman/color-parser-py
which is a python binding of rust's csscolorparser created by Claude without me touching editor or terminal. I haven't reviewed the code yet, I just ensured that test cases really passed (on github actions), installed the package and started using it directly.
Maybe the fact that I was just a kid made this different, but I guess my point is that just because AI can now write you a code file in 10 seconds, doesn't mean your learning process also got faster. It may still take years to become the developer that writes well-structured code and thinks of edge cases and understands everything that is going on.
When I imagine the young people that will sit down to build their own first thing with the help of AI, I'm really excited knowing that they might actually get a lot further a lot faster than I ever could.
GenAI can get deeper into a solution that consists of well known requirements. Like basic web application construction, api development, data storage, and oauth integration. GenAI can get close to 100%.
If you’re trying to build something that’s never been done before or is very complex, GenAI will only get to 50% and any attempt to continue will put you in a frustrating cycle of failure.
I’m having some further success by asking Claude to build a detailed Linear task list and tackling each task separately. To get this to work, I’ve built a file combining script and attaching these files to a Claude project. So one file might be project-client-src-components.txt and it contains all the files in my react nextjs app under that folder in a single file with full file path headers for each file.
We’ll see how deep I get before it can’t handle the codebase.
Where are these people in real life? A few influencers or wannabes say that on Twitter or LinkedIn, but do you know actual people in real life who say they’re "dramatically more productive with AI"?
Everyone I know or talked to about AI has been very critical and rational, and has roughly the same opinion: AIs for coding (Copilot, Cursor, etc.) are useful, but not that much. They’re mostly convenient for some parts of what constitutes coding.
Yesterday I asked o1-preview (the "best" reasoning AI on the market) how could I safely execute untrusted JavaScript code submitted by the user.
AI suggested a library called vm2, and gave me fully working code example. It's so good at programming that the code runs without any modifications from me.
However, then I looked up vm2's repository. It turns out to be an outdated project, abandoned due to security issues. The successor is isolated-vm.
The code AI gave me is 100% runnable. Had I not googled it, no amount of unit tests can tell me that vm2 is not the correct solution.
Using it as a smarter autocomplete is where I see a lot of productivity boosts. It replaces snippets, it completes full lines or blocks, and because verifying block likely takes less time than writing it, you can easily get a 100%+ speed up.
It wrote functions to separately generate differential equations for water/air side, and finally combined them into a single state vector derivative for integration. Easy peasy, right?
No. On closer inspection, the heat transfer equations had flipped signs, or were using the wrong temperatures. I'd also have preferred to have used structured arrays for vectors, instead of plain lists/arrays.
However, the framework was there. I had to tweak some equations, prompt the LLM to re-write state vector representations, and there it was!
AI-assisted coding is great for getting a skeleton for a project up. You have to add the meat to the bones yourself.
My feeling is that this stuff is not bottle-necked on model quality but on UX. Chat is not that great of an interface. Copy pasting blobs of text back to an editor seems like it is a bit monkey work. And monkey work should be automated.
With AI interactions now being able to call functions, what we need is deeper integration with the tools we use. Refactor this, rename that. Move that function here. Etc. There's no need for it to imagine these things perfectly it just needs to use the tools that make that happen. IDEs have a large API surface but a machine readable description of that easily fits in a context window.
Recently chat gpt added the ability to connect applications. So, I can jump into a chat, connect Intellij to the chat and ask it a question about code in my open editor. Works great and is better than me just copy pasting that to a chat window. But why can't it make a modification for me? It still requires me to copy text back to the editor and then hope it will work.
Addressing that would be the next logical step. Do it such that I can review what it did and undo any damage. But it could be a huge time saver. And it would also save some tokens. Because a lot of code it generates is just echoing what I already had with only a few lines modification. I want it to modify those lines and not risk hallucinating introducing mistakes into the rest, which is a thing you have to worry about.
The other issue is that iterating on code gets progressively harder as there's more of it and it needs to regenerate more of it at every step. That's a UX problem as well. It stems from the context being an imperfect abstraction of my actual code. Applying a lot of small/simple changes to code would be much easier than re-imagining the entire thing from scratch every time. Most of my conversations the code under discussion diverges from what I have in my editor. At some point continuing the conversation becomes pointless and I just start a new one with the actual code. Which is tedious because now I'm dealing with ground hog day of having to explain the same context again. More monkey work. And if you do it wrong, you have to do it over and over again. It's amazing that it works but also quite tedious.
I would not be confident betting a career on any those patterns holding. It is like people hand-optimising their assembly back in the day. At some point the compilers get good enough that the skill is a curio rather than an economic edge.
%100 agree, I am testing o1 for some math problems. I asked that to prove convolution of two gaussian is gaussian. It gave me 3 page algebraic solution it is correct but not elegant nor good. I have seen more ingenious solution. These tools are really good at doing something but not good at doing like expert human as they claimed.
ScreenshotToCode wants me to pay for a subscription, even before I have any idea of its capabilities. V0 keeps throwing an error in the generated code, which the AI tries to remedy, without success after 3 tries. Bolt.New redirects to StackBlitz, and after more than an hour, there are still spinners stuck on trying to import and deploy the generated code.
Sounds like snake oil all around. The days of AI-enabled low-/no-code are still quite a while away, I think, if at all feasible.
To me it's about asking the right questions no matter the level of experience. If you're junior it will help you ramp up at a speed I could no imagine before.
It doesn't matter how many times you showed that it invented assembly instructions or wxwidgets functions, they insist on cheating. I even told them the analogy of going to the gym: you lift with your own strength, you don't use a crane.
And of course, it is evident when you receive students that don't know what is a function or that cannot complete simple exercises during a written test.
We learned by reading, lots of trial and failing, curiosity, asking around, building a minimal reproducible bug for stackoverflow... they (the ones that rely only on chatgpt and not their brain) cannot even formulate a question by themselves.
This. Junior devs are f*cked. I don't know how else to say it.
This experiment made me think, maybe most of the benefit from AI comes from this mental workload shift that our minds subconsciously crave. It's not that we achieve astronomical levels of productivity but rather our minds are free from large programming tasks (which may have downstream effects of course).
I like the ideas presented in the post but it’s too long and highly repetitive.
AI will happily expand a few information dense bullet points into a lengthy essay. But the real work of a strong writer is distilling complex ideas into few words.
The hard truth is that you will learn nothing if you avoid doing the work yourself.
I'm often re-reading ewd-273 [0] from Dijkstra, The programming task considered as an intellectual challenge. How little distance have we made since that paper was published! His burning question:
> Can we get a better understanding of the nature of the programming task, so that by virtue of this better understanding, programming becomes an order of magnitude easier, so that our ability to compose reliable programs is increased by a similar order of magnitude?
I think the answer AI-assistants provide is... no. Instead we're using the "same old methods," Dijkstra disliked so much. We're expected to rely on the Lindy effect and debug the code until we feel more confident that it does what we want it to. And we still struggle to convince ourselves that these programs are correct. We have to content ourselves with testing and hoping that we don't cause too much damage in the AI-assisted programming world.
Not my preferred way to work and practice programming.
As for, "democratizing access to programming..." I can't think of a field that is more open to sharing it's knowledge and wisdom. I can't think of a field that is more eager to teach its skills to as many people as possible. I can't think of any industry that is more open to accepting people, without accreditation, to take up the work and become critical contributors.
There's no royal road. You have to do the work if you want to build the skill.
I'm not an educator but I suspect that AI isn't helping people learn the practice of programming. Certainly not in the sense that Dijkstra meant it. It may be helping people who aren't interested in learning the skills to develop software on their own... up to a point, 70% perhaps. But that's always been the case with low-code/no-code systems.
[0] https://www.cs.utexas.edu/~EWD/ewd02xx/EWD273.PDF
Update: Added missing link, fixed consistent mis-spelling of one of my favourite researchers' name!
This is nothing new. Algorithmic code generation has been around since forever, and it's robust in a way that "AI" is not. This is what many Java developers do, they have tools that integrate deeply with XML and libraries that consume XML output and create systems from that.
Sure, such tooling is dry and boring rather than absurdly polite and submissive, but if that's your kink, are you sure you want to bring it to work? What does it say about you as a professional?
As for IDE-integrated "assistants" and free floating LLM:s, when I don't get wrong code they consistently give suggestions that are much, much more complicated than the code I intend to write. If I were to let those I've tried write my code I'd be a huge liability for my team.
I expect the main result of the "AI" boom in software development to be a lot of work for people that are actually fluent, competent developers maintaining, replacing and decommissioning the stuff synthesised by people who aren't.
What I'd like to do is to ask "write me libuv based event loop processing messages described by protobuf files in ./protos directory. Use 4 bytes length prefix as a frame header" and then it goes and updates files in IDE itself, adding them to CMakeLists.txt if needed.
That would be an AI assisted coding and we can then discuss its quality, but does it exist? I'd be happy to give it a go.
I've been working on an agentic full-app codegen AI startup for about a year, and used Copilot and other coding assistance tools since it was generally available.
Last year, nobody even thought full app coding tools to be possible. Today they're all the rage: I track ~15 full codegen AI startups (what now seems to be called "agentic coding") and ~10 coding assistants. Of these, around half focus on a specific niche (eg. resolving github issues, full-stack coding a few app types, or building the frontend prototype), and half attempt to do full projects.
The paradox that Addy hints at is that senior, knowledgeable developers are much more likely to get value out of both of these categories. For assistants, you need to inspect the output and fix/adapt it. For agentic coders, you need to be able to micromanage or bypass them on issues that block them.
However, more experienced developers are (rightly) wary of new hyped up tools promising the moon. It's the junior devs, and even non-developers who drink the kool aid and embrace this, and then get stuck on 70%, or 90%... and they don't have the knowledge or experience to go past. It's worse than useless, they've spent their time, money, and possibly reputation (within their teams/orgs) on it, and got nothing out of it.
At the startup I mentioned, virtually all our dev time was spent on trying to move that breaking point from 50%, to 70%, to 90%, to larger projects, ... but in most cases it was still there. Literally an exponential amount of effort to move the needle. Based on this, I don't think we'll be able to see fully autonomous coding agents capable of doing non-toy projects any time soon. At the same time, the capabilities are rising and costs dropping down.
IMHO the biggest current limit for agentic coding is the speed (or lack of) of state-of-the-art models. If you can get 10x speed, you can throw in 10x more reasoning (inference-time computing, to use the modern buzzwords) and get 1.5x-2x better, in terms of quality or capability to reason about more complex projects.
I published it to a git repo with unit tests, great coverage, security scanning, and pretty decent documentation of how the tool works.
I estimate just coding the main tool would have been 2 or 3 days and all the other overhead would have been at least another day or two. So I did a week of work in a few hours today. Maybe it did 70%, maybe it did 42.5%, either way it was a massive improvement to the way I used to work.
In some ways, I'm not impressed by AI because much of what AI has achieved I feel could have been done without AI, it's just that putting all of it in a simple textbox is more "sleek" than putting all that functionality in a complex GUI.
I really dislike the entire narrative that's been built around the LLMs. Feels like startups are just creating hype to milk as much money out of VCs for as long as they can. They also like to use the classic and proven blockchain hype vocabulary (we're still early etc.).
Also the constant antropomorphizing of AI is getting ridiculous. We're not even close to replacing juniors with shitty generated code that might work. Reminds me of how we got "sold" automated shopping terminals. More convenient and faster that standing in line with a person but now you've got to do all the work yourself. Also the promises of doing stuff faster is nothing new. Productivity is skyrocketing but burnout is the hot topic at your average software conference.
When the AI boom started in 2022 I've been already focused on how to create provably, or likely correct software on budget.
Since then, I've figured out how to create correct software fast, on rapid iteration. (https://www.osequi.com/)
Now I can combine productivity and quality into one single framework / method / toolchain ... at least for a niche (React apps)
Do I use AI? Only for pair programming: suggestions for algorithms, suggestions for very small technical details like Typescript polymorphism.
Do I need more AI? Not really ...
My framework automates most part of the software development process: design (specification and documentation), development, verification. What's left is understanding aka designing the software architecture, and for that I'm using math, not AI, which provides me provably-correct translatable-to-code-models in a deterministic way. None of these will be offered by AI in the foreseeable future
But, this was a generic LLM, not a coding assistant. I wonder if they are different and if they remember what you were unhappy with the last time.
Also LLMs seem to be good with languages like Python, and really bad with C and Rust, especially when asked to do something with pointers, ownership, optimization etc.
I see a lot of devs who appear to be in a complete state of denial about what is happening. Understandable, but worrying.
One worry I have is what will happen to my own skills over time with these tools integrated into my workflow. I do think there's a lot of value in going through the loop of struggling with -> developing a better understanding of technologies. While it's possible to maintain this loop with coding assistants, they're undoubtedly optimized towards providing quick answers/results.
I'm able to accomplish a lot more with these coding assistants now, but it makes me wonder what growth I'm missing out on by not always having to do it the "hard" way.
These people were never at 70% in the first place.
The article also misses experts using this to accelerat themselves at things they are not expert in.
> the future isn't about AI replacing developers - it's about AI becoming an increasingly capable collaborator that can take initiative while still respecting human guidance and expertise.
I believe we will see humans transition to a purely ceremonial role for regulatory/liability reasons. Airplanes fly themselves with autopilot, but we still insist on putting humans at the yoke because everyone feels more comfortable with the arrangement.
I don’t ever use the code completion functionality in fact it can be a bit annoying. However asking it questions is the new Google search.
Over the last couple of years I’ve noticed that the quality of answers you get from googling has steeply declined, with most results now being terrible ad filled blog spam.
Asking the AI assistant the same query yields so much better answers and gives you the opportunity to delve deeper into said answer if you want to.
No more asking on stack overflow and having to wait for the inevitable snarky response.
It’s the best money I’ve spent on software in years. I feel like Picard asking the computer questions
Programming is not just about producing a program, it's about developing a mental model of the problem domain and how all the components interact. You don't get that when Claude is writing all your code, so unless the LLM is flawless (which it likely never be on novel problems), you won't understand the problem enough to know how to fix things when they go wrong.
Assistants that work best in the hands of someone who already knows what they're doing, removing tedium and providing an additional layer of quality assurance.
Pilot's still needed to get the plane in the air.
But even if the output from these tools is perfect, coding isn't only (or even mainly) about writing code, it's about building complex systems and finding workable solutions through problems that sometimes look like cul de sacs.
Once your codebase reaches a few thousand lines, LLMs struggle seeing the big picture and begin introducing one new problem for every one that they solve.
>Error messages that make no sense to normal users
>Edge cases that crash the application
>Confusing UI states that never got cleaned up
>Accessibility completely overlooked
>Performance issues on slower devices
>These aren't just P2 bugs - they're the difference between software people tolerate and software people love.
I wonder if we'll see something like the video game crash of 1983. Market saturation with shoddy games/software, followed by stigmatization: no one is willing to try out new apps anymore, because so many suck.
However one difference between these tools and previous human developed technologies is these tools are offering direct intelligence sent via the cloud to your environment.
That is unprecedented. Its rather like the the first time we started piping energy through wires. Sure it was clunky then, bit give it time. LLMs are just the first phase of this new era.
Make a simple HTML page which
uses the VideoEncoder API to
create a video that the user
can download.
So far, not a single AI has managed to create a working solution.I don't know why. The AIs seem to have an understanding of the VideoEncoder API, so it seems it's not a problem of not having the infos they need. But none comes up with something that works.
As AI is able to write more complex code, the skill of the engineer must increase to go in when necessary to diagnose the code it wrote, if you can’t, your app is stuck to the level of the AI
Honestly, this seems like a straw man. The kind of distributed productivity tools like Miro, Figma, Stackblitz, etc. that we all use day-to-day are both impressive in terms of what they do, but even more impressive in terms of how they work. Having been a remote worker 15 years ago, the difference in what is available today is light-years ahead of what was available back then.
I would disagree with this. There are many web apps and desktop apps that I’ve been using for years (some open source) and they’ve mostly all gotten noticeably better. I believe this is because the developers can iterate faster with AI.
Related
One of the best ways to get value for AI coding tools: generating tests
The 2024 Developer Survey indicates that programmers are increasingly using AI tools for testing, aiming to improve code quality and reduce tedious tasks, enhancing overall software reliability and efficiency.
Are Devs Becoming Lazy? The Rise of AI and the Decline of Care
The rise of AI tools like GitHub Copilot enhances productivity but raises concerns about developer complacency and skill decline, emphasizing the need for critical evaluation and ongoing skill maintenance.