September 28th, 2024

Researchers seeing little evidence of benefit from co pilots

A study by Uplevel found that AI coding assistants like GitHub Copilot do not significantly improve developer productivity and may increase bugs, with mixed results across different companies.

Read original article

FrustrationSkepticismConfusion

Researchers seeing little evidence of benefit from co pilots

A recent study by Uplevel has revealed that developers are not experiencing significant productivity gains from AI coding assistants like GitHub Copilot. The study analyzed the performance of around 800 developers over six months and found no notable improvements in key programming metrics such as pull request cycle time and throughput. In fact, the use of Copilot was associated with a 41% increase in bugs. While some developers report feeling more productive, the study suggests that many are spending more time reviewing code generated by AI rather than writing it. Additionally, the study indicated that AI tools have not alleviated developer burnout, as those using Copilot did not see a reduction in after-hours work compared to those who did not use it. Mixed results were reported across different companies; while some, like Innovative Solutions, noted significant productivity increases, others, such as Gehtsoft USA, found AI-generated code to be challenging to debug and often preferred rewriting code from scratch. The findings highlight the need for tempered expectations regarding the capabilities of AI coding assistants, emphasizing that they are not a replacement for human developers but rather tools that can enhance certain aspects of the coding process.

- A study found no significant productivity gains from AI coding assistants like GitHub Copilot.

- The use of Copilot was linked to a 41% increase in bugs in code.

- Developers reported spending more time reviewing AI-generated code rather than writing it.

- Mixed results were observed across different companies regarding the effectiveness of AI coding tools.

- Expectations about AI coding assistants should be moderated; they are not replacements for human developers.

Ask HN: Am I using AI wrong for code?

The author is concerned about underutilizing AI tools for coding, primarily using Claude for brainstorming and small code snippets, while seeking recommendations for tools that enhance coding productivity and collaboration.

Ask HN: Will AI make us unemployed?

The author highlights reliance on AI tools like ChatGPT and GitHub Copilot, noting a 30% efficiency boost and concerns about potential job loss due to AI's increasing coding capabilities.

GitHub Copilot – Lessons

Siddharth discusses GitHub Copilot's strengths in pair programming and learning new languages, but notes its limitations with complex tasks, verbosity, and potential impact on problem-solving skills among new programmers.

Effects of Gen AI on High Skilled Work: Experiments with Software Developers

A study on generative AI's impact on software developers revealed a 26.08% productivity increase, particularly benefiting less experienced developers, through trials at Microsoft, Accenture, and a Fortune 100 company.

Why Copilot Is Making Programmers Worse at Programming

AI-driven coding tools like Copilot may enhance productivity but risk eroding fundamental programming skills, fostering dependency, reducing learning opportunities, isolating developers, and creating a false sense of expertise.

AI: What people are saying

The comments reflect a range of experiences and opinions regarding AI coding assistants, particularly their impact on productivity and code quality.

Many users report mixed experiences, with some finding AI tools helpful for repetitive tasks while others encounter issues with incorrect code suggestions.
Several commenters note an increase in bugs and incidents correlated with the use of AI tools, attributing this to misplaced trust in the technology.
Some users emphasize the importance of verification and caution when using AI-generated code, likening it to working with human collaborators.
There is a divide in opinions on productivity, with some users claiming significant boosts while others feel the tools complicate their workflow.
Comments highlight the need for proper usage and understanding of AI tools, suggesting they are most effective when used as aids rather than replacements for human coding skills.

14 comments

By @jchw - 7 months

This is my personal experience, despite everyone swearing that it's a game changer. I've tried a fair few times now because people swear everything is revolutionary but I find them almost as annoying as helpful, and as many others have noticed you really have to be careful before accepting the code it outputs as correct, the subtly incorrect bits are extremely insidious. An example: in one case I was trying an AI tool to write some reasonably tricky logic for validation and at some point it got something very close to right but flipped part of a conditional. It took me probably 30 minutes to notice even though it should have been pretty obvious.

The best I can say is that the implementation in Jetbrains IntelliJ IDEA is pretty good. It's basically only useful for some repetitive Java boilerplate, but actually that's perfect, it's mindless enough yet easy to validate. It makes me dislike programming in Java a little bit less.

By @nbbnbb - 7 months

I don't have any formal data to prove this available without losing anonymity and probably getting sued by my employer but the introduction of them at my organisation correlates directly to a measurable rise in bugs and incidents. From causal analysis, the tools themselves are not directly responsible as such despite having limited veracity, but people trust them and do not do their jobs properly. There is also a mystique around them being the solution for all validation processes which leads to suboptimal attention at the validation stage on the hope that some vendor we already have is going to magically make a problem go away like they said they would at the last conference. I figure at this point the gain might be a negative on a social and human perspective the moment the idea was commercialised.

Urgh. I can't wait to retire.

By @wanderingbit - 7 months

This finding bewilders me, because my copilot (I use Sourcegraph’s Cody) has become an essential part of my dev productivity toolset. Being able to get answers to questions that would normally break me out of flow mode by simply Option + C’ing to open up a New Chat has been a productivity boost for me. Getting it to give me little snippets of code that I can use helps keep me in flow mode. Getting it to do a first pass on function comments, which I then edit, has made it much easier to get over the activation energy barrier that usually holds me back from doing full commenting.

I can’t say if the bug count is higher or not. Maybe it is higher in terms of total number of bugs I write throughout my coding session. But if bug count goes up 10% then the speed with which I fix those bugs and get to a final edit of my code is 30% or 40% faster, so the bug count is not the right metric.

Maybe the differentiator is that I am a solo-dev for all this work, and so the negative effects of the copilot are only experienced by me. If I were in a 10 person team, the bugs and the weird out of context code snippets would be magnified by the 9 other people, and the negative effects would be strong. But I don’t know.

By @pfisherman - 7 months

Had the chance to watch some non programmers use copilot for data science (using pandas) and it was an eye opening experience. I came away with the feeling that the tool landed in a sort of “uncanny valley” of productivity. If you can’t write the code without copilot then you won’t be able to debug the errors it makes. And if you know enough to spot and debug the errors, then copilot just gets in the way.

By @thepuppet33r - 7 months

Genuinely thought this was an article about copilots in planes and was terrified that airlines were going to cut back to one pilot in the cockpit to save a little more money.

By @stumblehump - 7 months

It sadly goes not the way of 'done better' but 'done at all' in terms of the helpful AI nonsense. Let me illustrate: I work in a mostly tech-illiterate context, big corpo, all proprietary and 5 departments away, so nobody would move a finger for your customising wishes without a month of foreplay. Me, a teenage script kiddie (css hacks for Myspace, minor AS2 template diddling) started with getting more complicated macros for Sheets. Then, some semi-interactive Apps Script, to collate and structure data. A bit of FTP automation here, JSON parsing there, Geodata alignment, context aware scrapes, PowerShell, minor Python one shots... and now, thanks to our new shitcode overlords, I find myself in the cockpit of a McGuyver supermachine, being able to run dozens of no longer manual, no more weeks in processing and subcontractor-sloppy tasks. Funnily enough I also started seeing patterns, typing out changes in-code, refactoring and integrating old chunks. Why not, then?

By @taftster - 7 months

I think an interesting use for copilot would be to ask it to find a bug given the description of an observed behavior. Let's say you're not super familiar with a code base, but yet you have found a bug (or "feature") that should be addressed. Having copilot narrow in on the logical code points to potentially address the issue would be invaluable.

Additionally, I find the copilot code suggestions during code reviews / pull requests sometimes useful. At times, it can offer some insightful bits about a code segment, such as potential exception handling fixes, etc.

I'd like to explore having copilot write unit tests, including representative test data, that can execute edge code paths. I haven't done this yet, but this seems exactly the type of thing that a "copilot" would do for me (not too unlike paired-programming, maybe).

Having a copilot completely write my code base, that's another thing entirely. There would be too much going back and verifying that it got it right. And additionally, I've seen it completely conjure up bogus solutions as well. For example, I've had copilot offer a configuration change that was completely fabricated; it looked legitimate enough that a senior systems engineer attempted to install/deliver the "fix" it offered when the suggestion was completely made up.

Overall, I guess my experience with copilot is not much different than working with any human. Trust but verify.

By @gtvwill - 7 months

Eh common theme amongst coders but I feel like it's less the LLM and more pebkac. You have a new tool, it can be hugely productive. You just need to use it right. Stop expecting it to write your whole app or create new formulas for hyper complex problems that haven't yet been solved or aren't common. It's a reference tool that's better than reading the docs or browsing stack overflow. Ask it for snippets of code, break up your tasks, use it to compare a number of methods that achieve the same result, discuss with it different approaches to the same problem.

  Much like how a nailgun won't just magically build you a house, it'll just let you build one quicker.

I get great benefit out of llms for coding. I'm not a good coder. But I am decent at planning and understanding what I want. Llms get me there 100x quicker than not using them. I don't need 4 years of cs to learn all the tedious algos for sorting or searching. I just ask an ai for a bunch of examples, assess them for what they are and get on with it. It can tell me the common pros and cons of it all and much like any other decision in business I make my best judgement and go with it.

Need to sort a heap of data into x y z or convert it from x to y? Llm will show me the way, now I don't need to hire someone to do it for me.

But alas, so many seem to think a language interpretation tool is actually a do it all one stop shop of production. Pebkac, your using the tool wrong.

By @mxxx - 7 months

The thing that I’ve seen my team use it most for is explaining blocks of code. We maintain a bunch of legacy systems that don’t get touched often and are written in stacks that our engineers aren’t completely fluent with, and it can be helpful when they’ve traced an issue to a particular function but the original intent or purpose of the code is obtuse.

By @Eisenstein - 7 months

> “Using LLMs to improve your productivity requires both the LLM to be competitive with an actual human in its abilities

No it does not. Does an assistant have to be as qualified as their boss?

> “The LLM does not possess critical thinking, self-awareness, or the ability to think.”

This is completely irrelevant. The LLM can understand your instructions and it can type 30,000 times faster than you.

By @marcinzm - 7 months

Cursor IDE with Claude 3.5 has been very beneficial for me in terms of productivity. Others a lot less so.

By @itsdrewmiller - 7 months

o1 is the first time I've really trusted the output of coding prompts beyond glorified autocomplete - it's a cut above what's currently out there.

By @kuldeepbhai - 7 months

jggdlhldvkgdkgdphkdgohelhhoddkgwkhhe lhpd