February 4th, 2025

How I use LLMs as a staff engineer

Sean Goedecke discusses the benefits and limitations of large language models in software engineering, highlighting their value in code writing and learning, while remaining cautious about their reliability for complex tasks.

Read original article

Sean Goedecke, a staff engineer, shares his perspective on the use of large language models (LLMs) in software engineering, highlighting both their benefits and limitations. He notes a divide among engineers regarding LLMs, with some viewing them as revolutionary and others as overhyped. Goedecke finds significant value in LLMs, particularly in tasks such as writing production code, where he uses tools like Copilot for boilerplate code and to make tactical changes in unfamiliar programming languages. He emphasizes the efficiency of LLMs in generating throwaway code for research purposes, claiming they can expedite the process by 2x to 4x. Additionally, he utilizes LLMs as a learning tool, asking questions and receiving feedback on his understanding of new domains. While he occasionally seeks help with bug fixes, he prefers to rely on his own skills, as LLMs often struggle with complex issues. For written communication, he uses LLMs for proofreading and catching typos but does not allow them to draft documents. Overall, he appreciates LLMs for specific tasks but remains cautious about their limitations, particularly in areas where he has expertise.

- Sean Goedecke finds LLMs valuable for tasks like code writing and learning new domains.

- He uses LLMs primarily for boilerplate code, throwaway research code, and as a tutor.

- Goedecke is cautious about relying on LLMs for bug fixes and prefers his own debugging skills.

- He employs LLMs for proofreading but does not let them draft his written communications.

- He acknowledges the divide in the engineering community regarding the utility of LLMs.

33 comments

By @toprerules - 2 months

As a fellow "staff engineer" LLMs are terrible at writing or teaching how to write idiomatic code, and they are actually causing me to spend more time reviewing than I was previously due to the influx of junior to senior engineers trying to sneak in LLM garbage.

In my opinion, using LLMs to write code comes as a faustian deal where you learn terrible practices and rely on code quantity, boilerplate, and indeterministic outputs - all hallmarks of poor software craftsmanship. Until ML can actually go end to end on requirements to product and they fire all of us, you can't cut corners on building intuition as a human by forgoing reading and writing code yourself.

I do think that there is a place for LLMs in generating ideas or exploring an untrusted knowledge base of information, but using code generated from an LLM is pure madness unless what you are building is truly going to be thrown away and rewritten from scratch, as is relying on it as a linting, debugging, or source of truth tool.

By @simonw - 2 months

On last resort bug fixes:

> I don’t do this a lot, but sometimes when I’m really stuck on a bug, I’ll attach the entire file or files to Copilot chat, paste the error message, and just ask “can you help?”

The "reasoning" models are MUCH better than this. I've had genuinely fantastic results with this kind of thing against o1 and Gemini Thinking and the new o3-mini - I paste in the whole codebase (usually via my https://github.com/simonw/files-to-prompt tool) and describe the bug or just paste in the error message and the model frequently finds the source, sometimes following the path through several modules to get there.

Here's a slightly order example: https://gist.github.com/simonw/03776d9f80534aa8e5348580dc6a8... - finding a bug in some Django middleware

By @jppope - 2 months

My experience is similar: great for boilerplate, great for autocomplete, starts to fall apart on complex tasks, doesn't do much as far as business logic (how would it know?)- All in all very useful, but not replacing a decent practitioner any time soon.

LLMs can absolutely bust out some corporate docs super crazy fast too... probably a reasonable thing to re-evaluate the value though

By @delduca - 2 months

> Disclaimer: I work for GitHub, and for a year I worked directly on Copilot.

Ah, now it makes sense.

By @foobazgt - 2 months

I wonder if the first bullet point, "smart auto complete", is much less beneficial if you're already using a statically typed language with a good IDE. I already feel like Intellij's auto complete reads my mind most of the time.

By @VenturingVole - 2 months

My first thought upon reading this was the observation about the fact software engineers are deeply split: How can they be so negative? A mixture of emotions.

Then I reflected, how very true it was. In fact, as of writing this there are 138 comments and I started simply scrolling through what was shown to assess the negative/neutral/positive bias based upon a highly subjective personal assessment: 2/3 were negative and so I decided to stop.

As a profession, it seems many of us have become accustomed to dealing in absolutes when reality is subjective. Judging LLMs prematurely with a level of perfectionism not even cast upon fellow humans.. or at least, if cast upon humans I'd be glad not to be their colleagues.

Honestly right now - I would use this as a litmus test in hiring and the majority would fail based upon their closed-mindedness and ability to understand how to effectively utilise tools at their disposal. It won't exist as a signal for much longer, sadly!

By @mvdtnz - 2 months

> What about hallucinations? Honestly, since GPT-3.5, I haven’t noticed ChatGPT or Claude doing a lot of hallucinating.

See this is what I don't get about the AI Evangelists. Every time I use the technology I am astounded at the amount of incorrect information and straight up fantasy it invents. When someone tells me that they just don't see it, I have to wonder what is motivating them to lie. There is simply no way you're using the same technology as me with such wildly different results.

By @stuartd - 2 months

> is this idiomatic C?

This is how I use AI at work for maintaining Python projects, a language in which I am not at all really versed. Sometimes I might add “this is how I would do it in …, how would I do this in Python?”

I find this extremely helpful and productive, especially as I have to pull the code onto a server to test it.

By @n144q - 2 months

Agree with many of the points here, especially the part with one-off, non-production code. I had great experience letting ChatGPT writing utility code. Once it provided Go code for an ad-hoc task which runs exactly as expected on first try, when it could cost me at least 30 minutes that's mostly spent on looking up APIs that I am not familiar with. Another time it created an HTTP server that worked with only minor tweaks. I don't want to think about life before LLMs existed.

One thing that is not mentioned -- code review. It is not great at it, often pointing out trivial or non issues. But if it finds 1 area for improvement out of 10 bullet points, that's still worth it -- most human code reviewers don't notice all the issues in the code anyway.

By @pgm8705 - 2 months

I used to feel they just served as a great auto complete or stack overflow replacement until I switched from VSCode to Cursor. Cursor's agent mode with Sonnet is pretty remarkable in what it can generate just from prompts. It is such a better experience than any of the AI tools VSCode provides, imo. I think tools like this when paired with an experienced developer to guide it and oversee the output can result in major productivity boosts. I agree with the sentiment that it falls apart with complex tasks or understanding unique business logic, but do think it can take you far beyond boilerplate.

By @fosterfriends - 2 months

"Proofreading for typos and logic mistakes: I write a fair amount of English documents: ADRs, technical summaries, internal posts, and so on. I never allow the LLM to write these for me. Part of that is that I think I can write more clearly than current LLMs. Part of it is my general distaste for the ChatGPT house style. What I do occasionally do is feed a draft into the LLM and ask for feedback. LLMs are great at catching typos, and will sometimes raise an interesting point that becomes an edit to my draft."

I work on Graphite Reviewer (https://graphite.dev/features/reviewer). I'm also partly dyslexic. I lean massively on Grammarly (using it to write this comment) and type-safe compiled languages. When I engineered at Airbnb, I caused multiple site outages due to typos in my ruby code that I didn't see and wasn't able to execute before prod.

The ability for LLMs to proofread code is a godsend. We've tuned Graphite Reviewer to shut up about subjective stylistic comments and focus on real bugs, mistakes, and typos. Fascinatingly, it catches a minor mistake in ~1/5 PRs in prod at real companies (we've run it on a few million PRs now). Those issues it catches result in a pre-merge code change 75% of the time, about equal to what a human comment does.

AIs aren't perfect, but Im thrilled that they work as fancy code spell-checkers :)

By @elwillbo - 2 months

I'm in your boat with having to write a significant amount of English documents. I always write them myself, and have ChatGPT analyze them as well. I just had a thought - I wonder if I could paste in technical documentation, and code, to validate my documentation? Will have to try that later.

CoPilot is used for simple boilerplate code, and also for the autocomplete. It's often a starting point for unit tests (but a thorough review is needed - you can't just accept it, I've seen it misinterpret code). I started experimenting with RA.Aid (https://github.com/ai-christianson/RA.Aid) after seeing a post on it here today. The multi-step actions are very promising. I'm about to try files-to-prompt (https://github.com/simonw/files-to-prompt) mentioned elsewhere in the thread.

For now, LLMs are a level-up in tooling but not a replacement for developers (at least yet)

By @devmor - 2 months

How I use LLMs as a senior engineer:

1. Try to write some code

2. Wonder why my IDE is providing irrelevant, confusing and obnoxious suggestions

3. Realize the AI completion plugin somehow turned itself back on

4. Turn it off

5. Do my job better than everyone that didn't do step 4

By @hinkley - 2 months

Some of the stuff in these explanations sounds like missing tools. There's a similar thread for a different article going around today, and I kept thinking at various points, "Maybe instead of having an LLM write you unit tests, you should check out Property Based Testing?"

The question I keep asking myself is, "Should we be making tools that auto-write code for us, or should we be using this training data to suss out the missing tools we have where everyone writes the same code 10 times in their careers?"

By @brianstrimp - 2 months

"as a staff engineer"

Such an unnecessary flex.

By @iamwil - 2 months

I'd been using 4o and 3o to read research papers and ask about topics that are a little bit out of my depth for a while now. I get massive amount of value out of that. What used to take me a week of googling and squinting at wikipedia or digging for slightly less theoretical blog posts, I get to just upload a paper or transcript of a talk and just keep asking it questions until I feel like I got all the insights and ah-ha moments.

At the end, I ask it to give me a quiz on everything we talked about and any other insights I might have missed. Instead of typing out the answers, I just use Apple Dictation to transcribe my answers directly.

It's only recently that I thought to take the conversation I just had, and have it write a blog post of the insights and ah-ha moments I had, and have it write a blog post. It takes a fair bit of curation to get it to do that, however. I can't just say, "write me a blog post on all we talked about". I have to first get it to write an outline with the key insights. And then based on the outline, write each section. And then I'll use chatgpt's canvas to guide and fine-tune each section.

However, at no point do I have to specifically write the actual text. I mostly do curation.

I feel ok about doing this, and don't consider it AI slop, because I clearly mark at the top that I didn't write a word of it, and it's the result of a curated conversation with 4o. In addition, I think if most people do this as a result of their own Socratic methods with an AI, it'd build up enough training data for next generation of AI to do a better job of writing pedagogical explanations, posts, and quizzes to get people learning topics that are just out of reach, but there hadn't been too many people able to bridge the gap.

The two I had it write are: Effects as Protocols and Contexts as Agents: https://interjectedfuture.com/effects-as-protocols-and-conte...

How free monads and functors represent syntax for algebraic effects: https://interjectedfuture.com/how-the-free-monad-and-functor...

By @callamdelaney - 2 months

My experience of copilot is that it’s completely useless and almost completely incapable of anything. 4o is reasonable though.

By @t8sr - 2 months

I guess I'm officially listed as a "staff engineer". I have been at this for 20 years, and I work with multiple teams in pretty different areas, like the kernel, some media/audio logic, security, database stuff... I end up alternating a lot between using Rust, Java, C++, C, Python and Go.

Coding assistant LLMs have changed how I work in a couple of ways:

1) They make it a lot easier to context switch between e.g. writing kernel code one day and a Pandas notebook the next, because you're no longer handicapped by slightly forgetting the idiosyncrasies of every single language. It's like having smart code search and documentation search built into the autocomplete.

2) They can do simple transformations of existing code really well, like generating a match expression from an enum. They can extrapolate the rest from 2-3 examples of something repetitive, like converting from Rust types into corresponding Arrow types.

I don't find the other use cases the author brings up realistic. The AI is terrible at code review and I have never seen it spot a logic error I missed. Asking the AI to explain how e.g. Unity works might feel nice, but the answers are at least 40% total bullshit and I think it's easier to just read the documentation.

I still get a lot of use out of Copilot. The speed boost and removal of friction lets me work on more stacks and, consequently, lead a much bigger span of related projects. Instead of explaining how to do something to a junior engineer, I can often just do it myself.

I don't understand how fresh grads can get use out of these things, though. Tools like Copilot need a lot of hand-holding. You can get them to follow simple instructions over a moderate amount of existing code, which works most of the time, or ask them to do something you don't exactly know how to do without looking it up, and then it's a crapshoot.

The main reason I get a lot of mileage out of Copilot is exactly because I have been doing this job for two decades and understand what's happening. People who are starting in the industry today, IMO, should be very judicious with how they use these tools, lest they end up with only a superficial knowledge of computing. Every project is a chance to learn, and by going all trial-and-error with a chatbot you're robbing yourself of that. (Not to mention the resulting code is almost certainly half-broken.)

By @bsder - 2 months

The big problem with LLMs as a "staff engineer" is that LLMs are precisely suited to the kind of tasks that I would normally assign to a junior engineer or cooperative engineering student.

That's bad because it makes "not training your juniors" the default path for senior people.

I can assign the task to one of my junior engineers and they will take several days of back and forth with me to work out the details--that's annoying but it's how you train the next generation.

Or I can ask the LLM and it will spit back something from its innards that got indexed from Github or StackOverflow. And for a "junior engineer" task it will probably be correct with the occasional hallucination--just like my junior engineers. And all I have to do for the LLM is click a couple of keys.

By @arcticfox - 2 months

> How I use LLMs as a staff engineer

With all the talk of o1-pro as a superb staff engineer-level architect, it took me awhile to re-parse this headline to understand what the author, apparently a staff engineer, meant

By @prisenco - 2 months

I used Copilot for a while (since the beta whenever that was) but recently I stopped and I'm glad I did. I use Claude and DeepSeek for searching documentation and rubber ducking/pair programming style conversations when I'm stuck, but that's about it.

I stick to a "no copy & paste" rule and that includes autocomplete. Interactions are a conversation but I write all my code myself.

By @nbaugh1 - 2 months

One thing I don't see mentioned often but is definitely true for me - I use Google like 90% less frequently now. I spend zero time crawling through various blogs or stack overflow questions to do things like understand an error I haven't seen before. Google is basically now a means of directing me to official docsites if I don't already know the URL

By @ggregoire - 2 months

Don't people actually enjoy writing code and solving problems on their own?

I would be so bored if my job consisted of writing prompts all day long.

By @nvarsj - 2 months

I'm trying to understand the point of the affix "as a staff engineer", but I cannot.

By @softwaredoug - 2 months

I think people mistakenly use LLMs as research tools, thinking in terms of search, when they're better as collaborators / co-creators of scaffolding you know you need to edit.

By @ur-whale - 2 months

This article completely correlates with my so far very positive experience of using LLM's to assist me in writing code.

By @SquibblesRedux - 2 months

I have found LLMs to be "good enough" for:

- imprecise semantic search

- simple auto-completion (1-5 tokens)

- copying patterns with substitutions

- inserting commonly-used templates

By @asdev - 2 months

if you write your code with good dependency injection/abstraction, you can one shot unit tests a lot of the time

By @why-el - 2 months

I was hoping the LLM is the staff engineer? can read both ways.

By @ddgflorida - 2 months

You summed it up well and your experience matches mine.

By @floppiplopp - 2 months

Yes, LLMs are great at generating corporate bullshit to appease the clueless middle management. I wouldn't trust its code generation for production systems though, but I can see how inexperienced devs might do just that.

By @dlvhdr - 2 months

Another article that doesn’t introduce anything new

How I use LLMs as a staff engineer

Related

Related