October 1st, 2024

Sorry, GenAI is NOT going to 10x computer programming

Generative AI is not delivering the expected tenfold increase in programming productivity, with studies showing modest gains and potential declines in code quality, emphasizing the value of traditional tools.

Read original articleLink Icon
Sorry, GenAI is NOT going to 10x computer programming

Recent discussions have suggested that Generative AI (GenAI) would significantly enhance programming productivity, potentially by a factor of ten. However, evidence from multiple studies indicates that such claims are overstated. One study involving 800 programmers found minimal improvement in productivity and an increase in bugs. Another study reported a moderate 26% improvement for junior developers but only marginal gains for senior developers. Earlier research also highlighted a decline in code quality and security, suggesting that the long-term productivity might be negatively impacted. The consensus is that while GenAI can assist in coding, it lacks the deep conceptual understanding necessary for substantial improvements. Instead, it may be more effective as a tool for speeding up coding tasks rather than replacing critical thinking about algorithms and data structures. The author, Gary Marcus, emphasizes that traditional tools like Integrated Development Environments (IDEs) may offer more reliable benefits at a lower cost. Overall, the data suggests that while GenAI can be a helpful aid, it is unlikely to revolutionize programming as initially claimed.

- Generative AI is not achieving the anticipated 10x improvement in programming productivity.

- Studies show only modest gains in productivity, with some negative impacts on code quality.

- GenAI lacks the deep conceptual understanding necessary for significant programming advancements.

- Traditional tools like IDEs may provide more reliable productivity enhancements.

- Caution is advised in relying on GenAI as a substitute for critical thinking in coding.

Link Icon 38 comments
By @koliber - 7 months
I am an experienced programmer, and just recently started using ChatGPT and Cursor to help me code. Some things it does like magic, and it's hard to say what n-fold improvement there is. I'll put the lower limit on 3x and an upper, on certain tasks, at 20x.

The project I am currently working on is took me about 16 hours to get to an MVP. A hobby project of similar size I did a few years ago took me about 80 hours. A lot of the work is NOT coding work that an LLM can help me with.

10x over everything is overstating it. However, if I can take my star developer who is already delivering significantly more value per dollar than my average guys and 3x him, that's quite a boost.

By @ccvannorman - 7 months
Just because this article isn't well formed or sourced doesnt make its claim incorrect.

I program daily and I use AI for it daily.

For short and simple programs AI can do it 100x faster, but it is fundamentally limited by its context size. As a program grows in complexity AI is not currently able to create or modify the codebase with success (around 2000 lines is where I found it has a barrier). I suspect it's due to exponential complexity associated with input size.

Show me an AI that can do this for a 10,000 lines complex program and I'll eat my own shorts

By @juliendorra - 7 months
For comparison VisiCalc and other early spreadsheets like Lotus 123 and Multiplan were 80x multipliers for their users:

“I would work for twenty hours,” Jackson said. “With a spreadsheet, it takes me 15 minutes.”

Not sure any other computing tool ever beat that since?

Source: Steven Levy reporting, Harper’s Bazaar 1984, reprinted in Wired: https://www.wired.com/2014/10/a-spreadsheet-way-of-knowledge...

By @carterparks - 7 months
It doesn't always 10x my development but on certain problems I can work 30x faster. In due time, this is only going to accelerate more as tooling becomes more closely integrated into dev workflows.
By @norir - 7 months
The bummer of ai hype is that I think there is so much we can do to improve programming with deterministic tooling. We have seen a remarkable improvement in terms of real time feedback directly in your editing context through IDEs and editor plugins.

We still don't yet have the next programming language for the ai age. I believe the biggest gains will not come from bigger models but smarter tools that can use stochastic tools like LLMs to explore the solution space and quickly verify the solution according to well defined constraints that can be expressed in the language.

When I hear about some of the massive productivity gains people ascribe to ai, I also wonder where the amazement for "rails new" has gone. We already have tools that can 100x your productivity in specific areas without also hallucinating random nonsense (though Rails is maybe not the absolute best counterexample here).

By @throw4847285 - 7 months
The reason GenAI has been so helpful for developers is that devs spend most of their time doing grunt work. That is, basic pattern matching: copy pasting chunks of code from one place to another and then modifying those chunks enough such that a couple code reviews will catch the things they forgot to modify. This is a miserable existence, and GenAI can alleviate it, but only because it was grunt work in the first place.
By @ayhanfuat - 7 months
It must be really hard to be the person who said "Deep learning is hitting a wall" in 2022. This is just doubling down.
By @Fripplebubby - 7 months
Do people generally really believe that GenAI would 10x programmer productivity? I find that surprising, that's not what I've read here on HN, by-and-large. 20-30% seems much more like what people are actually experiencing today. Is that controversial?
By @manofmanysmiles - 7 months
I think this guy has not worked with enough people who aren't that good at programming. GenAI lets me spit out reams of what to me is boilerplate, but to someone more junior might legitimately take weeks. It's a tool that right now empowers people who already are extremely effective. It is not a substitute for deep knowledge, wisdom and experience.
By @senko - 7 months
Gary takes credible facts there (modest gains, ai is not replacement for clear thinking), altough attacking the "10x" claim is a cheap shot because the claim was always nebulous.

However, in light of Gary's other AI related writings[0], it is clear he's not saying "people, be clear headed about the promises and limits of this new technology", he's saying "genai is shit". Yes, I am attacking the character/motive instead of the argument, because there's plenty of proof about the motive.

So I am kind of agreeing (facts as laid out) and vehemently disagreeing (the underlying premise his readers will get from this) with the guy at the same time.

[0]: Here are a few headlines I grabbed from his blog:

- How much should OpenAI’s abandoned promises be worth?

- Five reasons why OpenAI’s $150B financing might fall apart

- OpenAI’s slow-motion train wreck

- Why California’s AI safety bill should (still) be signed into law - and why that won’t be nearly enough

- Why the collapse of the Generative AI bubble may be imminent

By @eitally - 7 months
I think too many SWEs making comments like Gary has are discounting just how many software engineers are employed outside of tech, mostly working on either 1) workflow, 2) CRUD, or 3) reporting/analytics tools inside enterprise IT organizations. Between SaaS & GenAI, a very large percentage of those roles are starting to dry up.
By @resters - 7 months
I think it's already surpassed 10x, and is closer to 25x or 30x. You just have to know what to ask it to do.
By @dgoodell - 7 months
So far it has definitely saved me a lot of time googling api docs and stackoverflow for me.

Anything that requires applying things in novel ways that doesn’t have lots of examples out there already seems to be completely beyond it.

Also, it often comes up with very unoptimal and inefficient solutions even though it seems to be completely aware of better solutions when prodded.

So basically it’s a fully competent programmer lol.

By @throwaway918299 - 7 months
My average experience reviewing code written by “average developers” that start using copilot is that it ends up 0.1x them.

We are going to be so screwed in 10 years.

By @hackermatic - 7 months
I hope commenters will dig into the author's citations' data, in line with HN's discussion guidelines, instead of just expressing a negative opinion about the thrust of the article. The quantifiable impact of genAI on code productivity is an important research question, and very much an open question subject to bias from all the players in the space -- especially once you factor in quality, maintainability, and bugs or revisions over time.

The GitClear whitepaper that Marcus cites tries to account for some of these factors, but they're biased by selling their own code quality tools. Likewise, GitHub's whitepapers (and subsequent marketing) tend to study the perception of productivity and quality by developers, and other fuzzy factors like the suggestion acceptance rate -- but not bug rate or the durability of accepted suggestions over time. (I think perceived productivity and enjoyment of one's job are also important values, but they're not what these products are being sold on.)

By @Jimmc414 - 7 months
"Gary Marcus has been coding since he was 8 years old, and was very proud when his undergrad professor said he coded an order of magnitude faster than his peers. Just as Chollet might have predicted, most of the advantage came from clarity — about task, algorithms and data structures."

I'm trying to understand what sort of person would end with a 3rd person anecdote like this.

By @hiddencost - 7 months
Why do people keep posting this stuff? He's been writing variations on the same essay for decades and he's always wrong.
By @addisonj - 7 months
I want to make clear that I agree with the author that the current way in which AI is being used to make a 10x improvement in efficiency is NOT going to work, and it is for exactly reason stated:

> 10x-ing requires deep conceptual understanding – exactly what GenAI lacks. Writing lines of code without that understanding can only help so much.

IMHO, the real opportunity with AI, is to get developers less critical in being the only ones who translate conceptual understanding of the business problem into code and get developers more focused on the domain they should be experts in which, imho, is the large scale structure of the system and how that maps to actual computes.

What I mean by that... developers, are, on average, not good at having deep conceptual understanding of the business domain... and maybe have that understanding in the computational domain. All of the 10x developers I have ever met are mostly that because they do know both the business domain (or can learn it really quick) and the computational domain so they can quickly deliver value because they understand the problem end-to-end.

My internal model for thinking about this is we need developers thinking about building the right "intermediate representation" of a problem space and being able to manage the system at large, but we need domain experts, likely supported by AI assistants, in helping to use that IR to express the actual business problem. That is a super loose explanation... but if you have been in the industry awhile, you probably have felt the insanity of how long it can take to ship a fairly small feature because of a huge disconnect between the broader structure of the system (with microservices, async job systems, data pipelines, etc, which we can think of as the low-level representation of the sytem at large) and the high-level requirement of needing to store some data and process it according to some business specific rules.

I have no idea if we actually can get there as an industry... it is more likely we just have more terrible code... but maybe not

By @tobyhinloopen - 7 months
The real reason ChatGPT saves time is because it is faster to provide inaccurate answers than Google.
By @bitdeep - 7 months
Tools like aider or cursor composer does not help for complex code as they destroy your mental code model of the solution you working on.

This tools help a bit for initial mocks, but even that, I don't like as they create code that I don't know and I need to review it all to know where things is.

When you are doing complex software, you need to build a good mental code model to know where things is, especialy when you need to start to debug issues, not knowing where things is is a mess and very annoying.

This days, I almost don't use this tools anymore, I just prefer basic line auto-completion.

By @m463 - 7 months
My first job at college had schedules with high-level and detailed design, then coding, then integration and test.

Coding was only 3-5% of the time.

Also, I remember learning to race. I tried to learn to go faster through the turns.

But it turns out the way to be fast was to go a little faster on the straightaway and long turns. Trying to go faster through the tight turns doesn't help much.

If you go 1 mph faster for 10 feet in the slowest turn, it doesn't make as much difference as going .1 mph faster in a long 500 foot turn.

so...

Are we trying to haphazardly optimize for the small amount of time we code, when we should spend it elsewhere?

By @rekttrader - 7 months
As a CTO of a new startup, I am not hiring contractors or employees to build software. This may not be 10x, but when the time comes and I need scale I’m likely to hire what will be my smallest team yet.

One of the requirements for hiring is that they be proficient with copilot, cursor, or have rolled their own llama based code assistant.

It’s like going from using manual hand tools to power drills and nail guns. Those that doubt That AI will change all of technology jobs and work in the industry are gonna find themselves doing something else.

By @nateless - 7 months
Completely agree.

Tried spinning up a side project with just Cursor and GPT/Claude models, and we're definitely not at 'AI can build apps from just a prompt' yet. It couldn’t even create a dropdown filter like Linear’s, and the code was buggy, inefficient, and far from the full functionality I asked for. Maybe it's the niche language (Elixir), but we’re not there yet.

By @hluska - 7 months
I can only speak about what it’s done to productivity. A Gen AI is a very ambitious intern who will do anything I ask. I can keep making it refine things, or take what it has and finish it myself. There are no hurt feelings and I don’t have to worry about keeping my Gen AI motivated.

It certainly doesn’t write production quality code, but that entire first paragraph was science fiction a short time ago.

By @ratedgene - 7 months
I believe it's changed everything forever. I'm not even on the hype train. It's difficult for humans to estimate that impact, so it's easier to just deny. It's a fundamental shift on so many levels that we haven't even begun to scratch the surface and import not just to engineering but all other facets of life. The future is accelerating.
By @rkunal - 7 months
I will believe when GenAI becomes 10x, we would be noticing a larger number of emacs packages. Or at-least few attempts to rewrite old famous softwares like Calibre. I am sure that it is being used at a large scale in web apps and cloud tech. Not aware of the same in other domains.

May be in the future, everything will become a web app because genAI became so good at it.

By @janalsncm - 7 months
If you want to know how valuable GenAI is for software engineering, you don’t ask AI hype people on Twitter. You also don’t ask Gary Marcus. You ask people trying to complete software engineering tasks.

I guess part of the reason we can’t agree on whether they are useful is that their usefulness depends on how familiar you are with the programming task.

By @codingwagie - 7 months
I'll never understand people that cannot project current progress into what we may see with further progress.
By @richard___ - 7 months
This guy doesnt code and is breaking his back looking for data to support his biased hypothesis
By @xnx - 7 months
There's a big difference between "AI isn't 10x-ing computer programming" and "AI will not 10x computer programming". Am I missing the part of the article where some evidence is provided that the situation won't improve?
By @zarzavat - 7 months
There seem to be many people who are at pains to tell us what generative AI won't do in the future.

Any such prediction seems extremely likely to be false without a time bound, or an argument to physical impossibility.

By @munchausen42 - 7 months
Funny to see how being anti-GenAI and anti-LLM is now the new en vogue on HN. Can't wait till that dies off as well.
By @yawnxyz - 7 months
Keep telling other devs to stop using AI for code!!! I don't want to lose my edge lol
By @throwawa14223 - 7 months
I'd be shocked if it wasn't a negative at the end of the day.
By @sva_ - 7 months
This guy really tries too hard to make himself relevant.
By @dmitrygr - 7 months
For low level work (kernels, drivers, C, assembly), things like ChatGPT and Cursor are a 0.1X multiplier. They suggest idiotic things both design-wise and implementation-wise. In this world, I’ll take my cat’s help over theirs. Maybe it’ll improve, but I won’t hold my breath.
By @rogerclark - 7 months
It already has, for better or worse. Why does anyone still take this guy seriously? It's one thing to be skeptical that AI is going to make the world a better place... it's another thing to be skeptical that it exists and actually does things.
By @geenkeuse - 7 months
Looks like you guys are coming around.For the longest time it was the same "AI will never ever do what we do" song.

Now you are all mostly singing it's praises.

Good. You will be left standing when the stubborn are forced to call it quits.

What is so wrong with having a computer do amazing things for you simply by asking it?

That was the dream and the mission, wasn't it?

Unless "obfuscation is the enemy"

Most people don't want the IT guy to get into the details of how they did what they did. They just want to get on with it. Goes for the guy replacing your hard drive. Goes for the guy writing super complicated programs. IT is a commodity. A grudge purchase for most companies. But you know that, deep down. Don't you?

The bar has been lowered and raised at the same time. Amazing.