August 4th, 2024

GitHub Copilot – Lessons

Siddharth discusses GitHub Copilot's strengths in pair programming and learning new languages, but notes its limitations with complex tasks, verbosity, and potential impact on problem-solving skills among new programmers.

Read original articleLink Icon
GitHub Copilot – Lessons

Siddharth reflects on his experience with GitHub Copilot, emphasizing its role in pair programming over the past eight years. Initially impressed by the tool's capabilities, he later identified several limitations. While Copilot performs well with LeetCode problems, it struggles with real-world programming tasks that involve complex business logic across multiple files. The detailed prompts required for effective use often lead programmers to prefer writing code themselves. However, Copilot is beneficial for scanning existing code for errors, generating tests, and understanding unfamiliar or obfuscated codebases. It aids in learning new programming languages, particularly in grasping idiomatic expressions, although it should not replace traditional learning resources. Copilot excels in languages with extensive open-source code, such as JavaScript, Python, and Java, but is less effective for niche technologies. Users may find Copilot overly verbose and distracting, as it frequently offers low-quality suggestions without considering context. This behavior raises concerns about the potential decline in problem-solving skills among new programmers. Overall, while GitHub Copilot serves as a useful tool in certain scenarios, its limitations and the need for careful usage are evident.

Link Icon 10 comments
By @jmaker - 2 months
There have been many exaggerated claims surfacing across Twitter and the blogosphere. What I found to resonate best with my own experience is that Copilot is okay on simple tasks, but actually misguides, confuses, and in fact breaks your flow of thought on anything beyond that. I like that sort of smart autocompletion and snippet retrieval for boilerplate, templates code, but for the business logic proper it’s just awful. It either doesn’t get it at all but wants to insert some crap and the IDE isn’t always helpful to recognize that it’s distracting you, or it hallucinates something that seems to fit at first sight but once you accept the suggestion moments later you have to undo it because you realize how far off that is, how it misrepresents the domain entities or logic, as if a junior dev suddenly injected that code into your shared file buffer in a pair programming session, while you were focusing. So while Copilot autocomplete is on, you always have to analyze what a gung-ho quasi junior dev knee-jerks into your file buffer.

As many have witnesses, the Copilot Chat has been just terrible. I give up on my hopes for the current wave of the AI evolution.

What is uniform across all LLM is the lack of nuance. Even when they manage to generate some domain specific characters that do make sense, there’s always lack of detail and nuance to the domain, regardless how you try to trick your prompts to retrieve something from a very specific context. I’m impressed by how much easier it’s become for me to get a digest of longer texts, but at the same time I’m disappointed by the quality of the results. It is very rare that I get what I actually ask for. It’s like talking to a mid level consultant who pretends to know everything but the output is rather questionable, and you just give up and seek to end the meeting.

By @igammarays - 2 months
For me, Copilot and its sisters have been like self-driving cars: a slightly more advanced IDE autocomplete, analogous to a slightly more advanced cruise control. But it's far from Level 5 self driving and it's not obvious whether we will ever reach that.

You still have to keep your hands on the wheel and you need driving expertise. But since writing uncommitted code has less disastrous potential consequences, it is much more usable.

I still don't believe the people who claim they are using AI to write or rewrite entire codebases. Maybe for the first version of toy projects. But I've yet to see anyone using AI to automatically write entire features that span an enterprise software codebase.

By @mellosouls - 2 months
Copilot (that uses GPT 4 underneath)

I'm not sure this is true, Copilot Chat uses GPT4o but I've never seen a clarifying statement that the more immediate inline Copilot uses anything more than a 3 variant.

You would think that if that was the case, it would have been heavily sold.

The thing is, that if people using Copilot are assuming 4 but actually seeing the results of 3, it goes a significant way to explaining why they are sometimes underwhelmed or disappointed.

I'm happy to be corrected with a link that specifically clarifies that inline Copilot uses 4+.

By @elzbardico - 2 months
Every time I see someone extolling the virtues of LLM code generators I can't help but have the opinion that they must be shitty developers.

Copilot, Cody, Jetbrain's AI thing. All of them are helpful tools for a small set of tasks, like a super-charged code completion tool. But nothing more than it.

By @Quothling - 2 months
We use copilot as fancy auto-complete. Our original hopes for it was more than that, but it's not been up to the task. Yes it solves leetcode as the article mentions, but that is next to useless for most of what our developers do, what isn't useless is how good it is at replacing code snippets. Especially because it's very transferable between developers, as they no longer build up an archive of personal snippets. Or at least not as many of them. So it's much easier to onboard new developers and get them to be more productive than it was before copilot.

I don't think anyone at our shop has high hopes for LLM's in programming beyond efficiency anymore. I'd like to see Github copilot head in a direction where it's capable of auto-updating documentation such as JSdoc when functionality changes, LLM's are already excellent at writing documentation on "good" code, but the real trick is keeping it up-to-date as things change. I know this is also a change-management issue, but in the world where I mainly work, the time to properly maintain things isn't always prioritized by the business at large. Which obviously costs the business down the line, often grievously so, but as long as "IT" has very little pull in many organisations it's also just the state of things. I'd personally love for them to get better at writing and updating tests, but so far we've been far less successful with it than the author has.

As far as efficiency and quality goes our in-house measurements point in two directions. For inexperienced developers quality has dropped with the use of LLM's, which in our house is completely down to how employees (and this is not just developers) tend trust LLM's more than they would trust search results. So much so that a lot of AI usage has basically been banned from the wider organisation by the upper decision makers because quality is important in what we do. Yes I know this is ironic when you look at how they prioritize IT in an organisation where 90% of our employees use a computer 100% of their working hours. Anyway, as far as efficiency goes there are two sides. When used as fancy auto-complete we see an increase in work output across every kind of developer, however, when used as a "sparring partner" we see a significant decrease. We don't have the resources to do a lot of pair-programming and a couple of developers might do direct sparring on computation challenges for 1-2 hours a week. They are free to do so more, and they aren't punished for it as we don't do any sort of hourly registration on work, but 1-2 hours is where it's at on average. Sometimes it'll increase if they are dealing with complex business processes or if we're on-boarding some one new.

> Copilot is very useful to scan existing code for any errors or missed edge cases

Aside from tests I think this is the one part of the article I really haven't seen in our very anecdotal testing. But maybe this is down to us still learning how to adopt it properly or a difference in coding style? Anyway, almost all of our errors aren't with the actual code but rather with a misrepresentation/misunderstanding/unreported-change-in of business logic, and this has been the area where LLM's have been the weakest for us.

By @sidcool - 2 months
I am curious. Any reason why this post went from from front page to 156+ rank in a matter of minutes?
By @djaouen - 2 months
My whole thing with AI is, "But what if it's wrong?" At least with GitHub Copilot, I can correct the autocompleted output. I am less convinced about other ideas, such as having computers built solely with LLMs as an OS.
By @sidcool - 2 months
OP here. Happy to hear any feedback on the content and presentation of the post.