Sorry, GenAI is NOT going to 10x computer programming
Generative AI is not delivering the expected tenfold increase in programming productivity, with studies showing modest gains and potential declines in code quality, emphasizing the value of traditional tools.
Read original articleRecent discussions have suggested that Generative AI (GenAI) would significantly enhance programming productivity, potentially by a factor of ten. However, evidence from multiple studies indicates that such claims are overstated. One study involving 800 programmers found minimal improvement in productivity and an increase in bugs. Another study reported a moderate 26% improvement for junior developers but only marginal gains for senior developers. Earlier research also highlighted a decline in code quality and security, suggesting that the long-term productivity might be negatively impacted. The consensus is that while GenAI can assist in coding, it lacks the deep conceptual understanding necessary for substantial improvements. Instead, it may be more effective as a tool for speeding up coding tasks rather than replacing critical thinking about algorithms and data structures. The author, Gary Marcus, emphasizes that traditional tools like Integrated Development Environments (IDEs) may offer more reliable benefits at a lower cost. Overall, the data suggests that while GenAI can be a helpful aid, it is unlikely to revolutionize programming as initially claimed.
- Generative AI is not achieving the anticipated 10x improvement in programming productivity.
- Studies show only modest gains in productivity, with some negative impacts on code quality.
- GenAI lacks the deep conceptual understanding necessary for significant programming advancements.
- Traditional tools like IDEs may provide more reliable productivity enhancements.
- Caution is advised in relying on GenAI as a substitute for critical thinking in coding.
Related
Up to 90% of my code is now generated by AI
A senior full-stack developer discusses the transformative impact of generative AI on programming, emphasizing the importance of creativity, continuous learning, and responsible integration of AI tools in coding practices.
Effects of Gen AI on High Skilled Work: Experiments with Software Developers
A study on generative AI's impact on software developers revealed a 26.08% productivity increase, particularly benefiting less experienced developers, through trials at Microsoft, Accenture, and a Fortune 100 company.
Devs gaining little (if anything) from AI coding assistants
A study by Uplevel found that AI coding assistants like GitHub Copilot do not significantly boost developer productivity, increase bugs, and lead to more time spent reviewing code rather than writing it.
The project I am currently working on is took me about 16 hours to get to an MVP. A hobby project of similar size I did a few years ago took me about 80 hours. A lot of the work is NOT coding work that an LLM can help me with.
10x over everything is overstating it. However, if I can take my star developer who is already delivering significantly more value per dollar than my average guys and 3x him, that's quite a boost.
I program daily and I use AI for it daily.
For short and simple programs AI can do it 100x faster, but it is fundamentally limited by its context size. As a program grows in complexity AI is not currently able to create or modify the codebase with success (around 2000 lines is where I found it has a barrier). I suspect it's due to exponential complexity associated with input size.
Show me an AI that can do this for a 10,000 lines complex program and I'll eat my own shorts
“I would work for twenty hours,” Jackson said. “With a spreadsheet, it takes me 15 minutes.”
Not sure any other computing tool ever beat that since?
Source: Steven Levy reporting, Harper’s Bazaar 1984, reprinted in Wired: https://www.wired.com/2014/10/a-spreadsheet-way-of-knowledge...
We still don't yet have the next programming language for the ai age. I believe the biggest gains will not come from bigger models but smarter tools that can use stochastic tools like LLMs to explore the solution space and quickly verify the solution according to well defined constraints that can be expressed in the language.
When I hear about some of the massive productivity gains people ascribe to ai, I also wonder where the amazement for "rails new" has gone. We already have tools that can 100x your productivity in specific areas without also hallucinating random nonsense (though Rails is maybe not the absolute best counterexample here).
However, in light of Gary's other AI related writings[0], it is clear he's not saying "people, be clear headed about the promises and limits of this new technology", he's saying "genai is shit". Yes, I am attacking the character/motive instead of the argument, because there's plenty of proof about the motive.
So I am kind of agreeing (facts as laid out) and vehemently disagreeing (the underlying premise his readers will get from this) with the guy at the same time.
[0]: Here are a few headlines I grabbed from his blog:
- How much should OpenAI’s abandoned promises be worth?
- Five reasons why OpenAI’s $150B financing might fall apart
- OpenAI’s slow-motion train wreck
- Why California’s AI safety bill should (still) be signed into law - and why that won’t be nearly enough
- Why the collapse of the Generative AI bubble may be imminent
Anything that requires applying things in novel ways that doesn’t have lots of examples out there already seems to be completely beyond it.
Also, it often comes up with very unoptimal and inefficient solutions even though it seems to be completely aware of better solutions when prodded.
So basically it’s a fully competent programmer lol.
We are going to be so screwed in 10 years.
The GitClear whitepaper that Marcus cites tries to account for some of these factors, but they're biased by selling their own code quality tools. Likewise, GitHub's whitepapers (and subsequent marketing) tend to study the perception of productivity and quality by developers, and other fuzzy factors like the suggestion acceptance rate -- but not bug rate or the durability of accepted suggestions over time. (I think perceived productivity and enjoyment of one's job are also important values, but they're not what these products are being sold on.)
I'm trying to understand what sort of person would end with a 3rd person anecdote like this.
> 10x-ing requires deep conceptual understanding – exactly what GenAI lacks. Writing lines of code without that understanding can only help so much.
IMHO, the real opportunity with AI, is to get developers less critical in being the only ones who translate conceptual understanding of the business problem into code and get developers more focused on the domain they should be experts in which, imho, is the large scale structure of the system and how that maps to actual computes.
What I mean by that... developers, are, on average, not good at having deep conceptual understanding of the business domain... and maybe have that understanding in the computational domain. All of the 10x developers I have ever met are mostly that because they do know both the business domain (or can learn it really quick) and the computational domain so they can quickly deliver value because they understand the problem end-to-end.
My internal model for thinking about this is we need developers thinking about building the right "intermediate representation" of a problem space and being able to manage the system at large, but we need domain experts, likely supported by AI assistants, in helping to use that IR to express the actual business problem. That is a super loose explanation... but if you have been in the industry awhile, you probably have felt the insanity of how long it can take to ship a fairly small feature because of a huge disconnect between the broader structure of the system (with microservices, async job systems, data pipelines, etc, which we can think of as the low-level representation of the sytem at large) and the high-level requirement of needing to store some data and process it according to some business specific rules.
I have no idea if we actually can get there as an industry... it is more likely we just have more terrible code... but maybe not
This tools help a bit for initial mocks, but even that, I don't like as they create code that I don't know and I need to review it all to know where things is.
When you are doing complex software, you need to build a good mental code model to know where things is, especialy when you need to start to debug issues, not knowing where things is is a mess and very annoying.
This days, I almost don't use this tools anymore, I just prefer basic line auto-completion.
Coding was only 3-5% of the time.
Also, I remember learning to race. I tried to learn to go faster through the turns.
But it turns out the way to be fast was to go a little faster on the straightaway and long turns. Trying to go faster through the tight turns doesn't help much.
If you go 1 mph faster for 10 feet in the slowest turn, it doesn't make as much difference as going .1 mph faster in a long 500 foot turn.
so...
Are we trying to haphazardly optimize for the small amount of time we code, when we should spend it elsewhere?
One of the requirements for hiring is that they be proficient with copilot, cursor, or have rolled their own llama based code assistant.
It’s like going from using manual hand tools to power drills and nail guns. Those that doubt That AI will change all of technology jobs and work in the industry are gonna find themselves doing something else.
Tried spinning up a side project with just Cursor and GPT/Claude models, and we're definitely not at 'AI can build apps from just a prompt' yet. It couldn’t even create a dropdown filter like Linear’s, and the code was buggy, inefficient, and far from the full functionality I asked for. Maybe it's the niche language (Elixir), but we’re not there yet.
It certainly doesn’t write production quality code, but that entire first paragraph was science fiction a short time ago.
May be in the future, everything will become a web app because genAI became so good at it.
I guess part of the reason we can’t agree on whether they are useful is that their usefulness depends on how familiar you are with the programming task.
Any such prediction seems extremely likely to be false without a time bound, or an argument to physical impossibility.
Now you are all mostly singing it's praises.
Good. You will be left standing when the stubborn are forced to call it quits.
What is so wrong with having a computer do amazing things for you simply by asking it?
That was the dream and the mission, wasn't it?
Unless "obfuscation is the enemy"
Most people don't want the IT guy to get into the details of how they did what they did. They just want to get on with it. Goes for the guy replacing your hard drive. Goes for the guy writing super complicated programs. IT is a commodity. A grudge purchase for most companies. But you know that, deep down. Don't you?
The bar has been lowered and raised at the same time. Amazing.
Related
Up to 90% of my code is now generated by AI
A senior full-stack developer discusses the transformative impact of generative AI on programming, emphasizing the importance of creativity, continuous learning, and responsible integration of AI tools in coding practices.
Effects of Gen AI on High Skilled Work: Experiments with Software Developers
A study on generative AI's impact on software developers revealed a 26.08% productivity increase, particularly benefiting less experienced developers, through trials at Microsoft, Accenture, and a Fortune 100 company.
Devs gaining little (if anything) from AI coding assistants
A study by Uplevel found that AI coding assistants like GitHub Copilot do not significantly boost developer productivity, increase bugs, and lead to more time spent reviewing code rather than writing it.