How I Program with LLMs
The author discusses the positive impact of large language models on programming productivity, highlighting their uses in autocomplete, search, and chat-driven programming, while emphasizing the importance of clear objectives.
Read original articleThe author shares their experiences using large language models (LLMs) in programming over the past year, highlighting their positive impact on productivity. They emphasize a proactive approach to integrating LLMs into their workflow, which has led to the development of a tool for Go programming called sketch.dev. The author identifies three primary uses for LLMs: autocomplete, search, and chat-driven programming. Autocomplete enhances productivity by reducing mundane typing, while LLMs provide better answers to complex questions compared to traditional search engines. Chat-driven programming, though challenging, offers significant value by generating initial drafts and ideas, especially when the programmer lacks the energy to start from scratch. The author notes that effective use of LLMs requires clear objectives and manageable complexity to avoid confusion. They also discuss the advantages of smaller code packages, which facilitate LLM interactions and improve code readability. The author concludes that while LLMs can produce errors, they are adept at correcting mistakes when provided with feedback. Overall, the integration of LLMs into programming practices has proven beneficial, particularly in product development contexts.
- LLMs have a net-positive effect on programming productivity.
- Key uses of LLMs include autocomplete, search, and chat-driven programming.
- Smaller code packages enhance LLM interactions and improve code readability.
- Effective use of LLMs requires clear objectives and manageable complexity.
- LLMs can produce errors but are capable of correcting them with feedback.
Related
- Many users find LLMs beneficial for autocomplete and search functionalities, significantly enhancing their coding efficiency.
- Chat-driven programming is seen as a double-edged sword; while it can provide useful starting points, it often generates buggy or incomplete code that requires substantial manual correction.
- Some programmers express skepticism about relying on LLMs for serious projects, emphasizing the importance of understanding the code and maintaining quality.
- Concerns about the long-term implications of LLMs on junior developers and the potential for increased technical debt are prevalent.
- There is a recognition that LLMs serve as valuable tools for brainstorming and debugging, acting as a "thought partner" in the coding process.
His post reminds me of an old idea I had of a language where all you wrote was function signatures and high-level control flow, and maybe some conformance tests around them. The language was designed around filling in the implementations for you. 20 years ago that would have been from a live online database, with implementations vying for popularity on the basis of speed or correctness. Nowadays LLMs would generate most of it on the fly, presumably.
Most ideas are unoriginal, so I wouldn't be surprised if this has been tried already.
This to me is the biggest advantage of LLMs. They dramatically reduce the activation energy of doing something you are unfamiliar with. Much in the way that you're a lot more likely to try kitesurfing if you are at the beach standing next to a kitesurfing instructor.
While LLMs may not yet have human-level depth, it's clear that they already have vastly superhuman breadth. You can argue about the current level of expertise (does it have undergrad knowledge in every field? PhD level knowledge in every field?) but you can't argue about the breadth of fields, nor that the level of expertise improves every year.
My guess is that the programmers who find LLMs useful are people who do a lot of different kinds of programming every week (and thus are constantly going from incompetent to competent in things that other people already know), rather than domain experts who do the same kind of narrow and specialized work every day.
I find chat for search is really helpful (as the article states)
I frequently use what OP refers to as chat-driven programming, and I find it incredibly useful. My process starts by explaining a minimum viable product to the chat, which then generates the code for me. Sometimes, the code requires a bit of manual tweaking, but it’s usually a solid starting point. From there, I describe each new feature I want to add—often pasting in specific functions for the chat to modify or expand.
This approach significantly boosts what I can get done in one coding session. I can take an idea and turn it into something functional on the same day. It allows me to quickly test all my ideas, and if one doesn’t help as expected, I haven’t wasted much time or effort.
The biggest downside, however, is the rapid accumulation of technical debt. The code can get messy quickly. There's often a lot of redundancy and after a few iterations it can be quite daunting to modify.
But having the LLM do things for me, I frequently run into issues where it feels like I'm wasting my time with an intern. "Chat-based LLMs do best with exam-style questions" really speaks to me, however I find that constructing my prompts in such a way where the LLM does what I want uses just as much brainpower as just programming the thing my self.
I do find ChatGPT (o1 especially) really good at optimizing existing code.
We had an issue recently with a task queue seemingly randomly stalling. We were able to arrive at the root cause much more quickly than we would have because of a back-and-forth brainstorming session with Claude, which involved describing the issue we were seeing, pasting in code from library to ask questions, asking it to write some code to add some missing telemetry, and then probing it for ideas on what might be going wrong. An issue that may have taken days to debug took about an hour to identify.
Think of it as rubber ducking with a very strong generalist engineer who knows about basically any technical concepts.
Like, yesterday I made some light changes to a containerized VPN proxy that I maintain. My first thought wasn't "how would Claude do this?" Same thing with an API I made a few weeks ago that scrapes a flight data website to summarize flights in JSON form.
I knew I would need to write some boilerplate and that I'd have to visit SO for some stuff, but asking Claude or o1 to write the tests or boilerplate for me wasn't something I wanted or needed to do. I guess it makes me slower, sure, but I actually enjoy the process of making the software end to end.
Then again, I do all of my programming on Vim and, technically, writing software isn't my day job (I'm in pre-sales, so, best case, I'm writing POC stuff). Perhaps I'd feel differently if I were doing this day in, day out. (Interestingly, I feel the same way about AI in this sense that I do about VSCode. I've used it; I know what's it capable of; I have no interest in it at all.)
The closest I got to "I'll use LLMs for something real" was using it in my backend app that tracks all of my expenses to parse pictures of receipts. Theoretically, this will save me 30 seconds per scan, as I won't need to add all of the transaction metadata myself. Realistically, this would (a) make my review process slower, as LLMs are not yet capable of saying "I'm not sure" and I'd have to manually check each transaction at review time, (b) make my submit API endpoint slower since it takes relatively-forever for it to analyze images (or at least it did when I experimented with this on GPT4-turbo last year), and (c) drive my costs way up (this service costs almost nothing to run, as I run it within Lambda's free tier limit).
Using it to generate blocks of code in a chat like manner in my opinion just never works well enough in the domains I use it on. I'll try to get it to generate something and then realize when I get some functional result I could've done it faster and more effectively.
Funny enough, other commenters here hate autocomplete but love chat.
Some years ago I gave a task to some of my younger (but intelligent) coworkers.
They spent about 50 minutes searching in google and came back to me saying they couldn't find what they were looking for.
I then typed in a query, clicked one of the first search results and BAM! - there was the information they were unable to find.
What was the difference? It was the keywords / phrases we were using.
LLMs are just a life saver. Literally.
They take my code time down from weeks to an afternoon, sometimes less. Any they're kind.
I'm trying to write a baseball simulator on my own, as a stretch goal. I'm writing my own functions now, a step up for me. The code is to take in real stats, do Monte Carlo, get results. Basic stuff. Such a task was impossible for me before LLMs. I've tried it a few times. No go. Now with LLMs, I've got the skeleton working and should be good to go before opening day. I'm hoping that I can use it for some novels that I am writing to get more realistic stats (don't ask).
I know a lot of HN is very dismissive of LLMs as code help. But to me, a non programmer, they've opened it up. I can do things I never imagined that I could. Is it prod ready? Hell no, please God no. But is it good enough for me to putz with and get just working? Absolutely.
I've downloaded a bunch of free ones from huggingface and Meta just to be sure they can't take them away from me. I'm never going back to that frustration, that 'Why can't I just be not so stupid?', that self-hating, that darkness. They have liberated me.
As far as I know, the idea of a scratch "buffer" comes from emacs. But in Jetbrains IDEs, you have the full IDE support even with context from your current project (you can pick the "modules" you want to have in context). Given the good integration with LLMs, that's basically what the author seems to want. Perhaps give GoLand[2] a try.
Disclosure: no, I don't work for Jetbrains :D just a very happy customer.
- be way more reliable
- probably be up to date on how you should solve it in latest/recommend approach
- put you in a place where you can search for adjecent tech
LLM with search has potential but I'd like if current tools are more oriented on source material rather than AI paraphrasing.
The first few steps were great. Guided me to install things and setup a project structure. The model even generated codes for a few files.
Then something went wrong, the model kept telling me what to do in vague, but didn’t output codes anymore. So I asked for further help, and now it started contradicting itself, rewriting business logic that were implemented in the first response, 3-4 pieces of code snippets of the same file that aren’t compatible etc, and it all fell apart.
or writing tests - that's ... not so helpful. worst is when a lazy dev takes the generated tests and leaves it at that: usually just a few placeholders that test the happy path but ignore obvious corner cases. (I suppose for API tests that comes down to adding test case parameters)
but chatting about a large codebase, I've been amazed at how helpful it can be.
what software patterns can you see in this repo? how does the implementation compare to others in the organisation? what common features of the pattern are missing?
also, like a linter on steroids, chat can help explore how my project might be refactored to better match the organisation's coding style.
My experience with LLM code is that it can't come up with anything even remotely novel. If I say "make it run in amortized O(1)" then 99 times out of 100 I'll get a solution so wildly incorrect (but confidently asserting its own correctness) that it can't possibly be reshaped into something reasonable without a re-write. The remaining 1/100 times aren't usually "good" either.
For the reservoir sampler -- here, it did do the job. David almost certainly knows enough to know the limits of that code and is happy with its limitations. I've solved that particular problem at $WORK though (reservoir sampling for percentile estimates), and for the life of me I can't find a single LLM prompt or sequence of prompts that comes anywhere close to optimality unless that prompt also includes the sorts of insights which lead to an amortized O(1) algorithm being possible (and, even then, you still have to re-run the query many times to get a useful response).
Picking on the article's solution a bit, why on earth is `sorted` appearing in the quantile estimation phase? That's fine if you're only using the data structure once (init -> finalize), but it's uselessly slow otherwise, even ignoring splay trees or anything else you could use to speed up the final inference further.
I personally find LLMs helpful for development when either (1) you can tolerate those sorts of mishaps (e.g., I just want to run a certain algorithm through Scala and don't really care how slow it is if I can run it once and hexedit the output), or (2) you can supply all the auxilliary information so that the LLM has a decent chance of doing it right -- once you've solved the hard problems, the LLM can often get the boilerplate correct when framing and encapsulating your ideas.
It feels like the IDE needs a new mode to deal with this state, and that SCM needs to be involved somehow too. Somehow help the developer guide this somewhat flaky stream of edits and sculpt it into a good changeset.
So what we can get out of it is everything that has been written (and publicly released) before translated to any language it knows about.
This has some consequences.
1. Programmers still need to know what algorithms or interfaces or models they want.
2. Programmers do not have to know a language very well anymore, to write code, but the have to for bug fixing. Consequently the rift between garbage software and quality software will grow.
3. New programming languages will face a big economical hurdle to take off.
Are the results a paradigm shift so much better that it's worth the hundreds of billions sunk into the hardware and data centers? Is spicy autocomplete worth the equivalent of flying from New York to London while guzzling thousands of liters of water?
It might work, for some definition of useful, but what happens when the AI companies try to claw back some of that half a trillion dollars they burnt?
Then I needed to write a simple command line utility, so I wrote it in Go, even though I've never written Go before. Being able to make tiny standalone executables which do real work is incredible.
Now if I ever need to write something, I can choose the language most suited to the task, not the one I happen to have the most experience with.
That's a superpower.
Can't recommend aider enough. I've tried many different coding tools, but they all seem like a leaky abstraction over LLMs medium of sequential text generation. Aider, on the other hand, leans into it in the best possible way.
I like gptresearcher and all of the glue put in place to be able to extend prompts and agents etc. Not to mention the ability to fetch resources from the web and do research type summaries on it.
All in all it reminds me the work of security researchers, pentesters and analysts. Throughout the career they would build a set of tools and scripts to solve various problems. LLMs kind of force the devs to create/select tools for themselves to ease the burden of their specific line of work as well. You could work without LLMs but maybe it will be a bit more difficult to stand out in the future.
I'm probably in the same place as the author, using Chat-GPT to create functions etc, then cut and pasting that into VSCode.
I've started using cline which allows me to code using prompts inside VSCode.
i.e. Create a new page so that users can add tasks to a tasks table.
I'm getting mixed results, but it is very promising. I create a clinerules file which gets added to the system prompt so the AI is more aware of my architecture. I'm also looking at overiding the cline system prompt to both make it fit my architecture better and also to remove stuff I don't need.
I jokingly imagine in the future we won't get asked how long a new feature will take, rather, how many tokens will it take.
So one thing that doesn't get a mention in the article but is quite significant I think is the long lag of knowledge cutoff dates: looking at even the latest and greatest, there is one year or more of missing information.
I would love for someone more versed than me to tell us how best to use RAG or LoRA to get the model to answer with fully up to date knowledge on libraries, frameworks, ...
My workflow puts LLM chat at my fingertips, and I can control the context. Pretty much any text in emacs can be sent to a LLM of your choice via API.
Aider is even better, it does a bunch of tricks to improve performance, and is rapidly becoming a 'must have' benchmark for LLM coding. It integrates with git so each chat modification becomes a new git commit. Easy to undo changes, redo changes, etc. It also has a bunch of hacks because while o1 is good as reasoning, it (apparently) doesn't do code modification well. Aider will send different types of requests to different 'strengths' of LLMs etc. Although if you can use sonnet, you can just use that and be done with it.
It's pretty good, but ultimately it's still just a tool for transforming words into code. It won't help you think or understand.
I feel bad for new kids who won't develop muscle and sight strength to read/write code. Because you still need to read/write code, and can't rely on the chat interface for everything.
I don't think this is about LLMs getting better, but search becoming worse. In no small thanks to LLMs polluting the results. Do search images for terms and count how many are AI generated.
I can say I got better result from Google X years ago vs Google of today.
I do mostly 2/ Search, which is like a personalized Stack Overflow and sometimes feels incredible. You can ask a general question about a specific problem and then dive into some specific point to make sure you understand every part clearly. This works best for things one doesn't know enough about, but has a general idea of how the solution should sound or what it should do. Or, copy-pasting error messages from tools like Docker and have the LLM debug it for you really feels like magic.
For some reason I have always disliked autocomplete anywhere, so I don't do that.
The third way, chat-driven programming, is more difficult, because the code generated by LLMs can be large, and can also be wrong. LLMs are too eager to help, and they will try to find a solution even if there isn't one, and will invent it if necessary. Telling them in the prompt to say "I don't know" or "it's impossible" if need be, can help.
But, like the author says, it's very helpful to get started on something.
> That is why I still use an LLM via a web browser, because I want a blank slate on which to craft a well-contained request
That's also what I do. I wouldn't like having something in the IDE trying to second guess what I write or suddenly absorbing everything into context and coming up with answers that it thinks make a lot of sense but actually don't.
But the main benefit is, like the author says, that it lets one start afresh with every new question or problem, and save focused threads on specific topics.
I have to say that I am impressed with sketch.dev, it got me a working example from the first try and it looked cleaner form all the others, similar but cleaner somehow in terms of styling.
The whole time I was using those tools I was thinking that I want exactly this a LLM trained specifically on the Go official documentation, or whatever your favourite language is, ideally fined tuned by the maintainers of the language.
I want the LLM to show me an idiomatic way to write an API using the standard library I don't necessarily want it to do it instead of me, or to be trained on all of the scrapped data they could scrape. Show me a couple of examples maybe explain a concept, give me steps by step guidance.
I also share his frustrations with the chat based approach what annoys me personally the most is the anthropomorphization of the LLMs, yesterday Gemini was even patronizing me...
The fast iteration cycle of getting a baseline (but less than ideal or even completely wrong) is a great point here. Redoing the work is fast and easy but still requires review and validation to know how to request the rework to obtain the optimal result.
I would write the tests first and foremost: they are the specification. They’re for future me and other maintainers to understand and I wouldn’t want them to be generated: write them with the intention of explaining the module or system to another person. If the code isn’t that important I’ll write unit tests. If I need better assurances I’ll write property tests at a minimum.
If I’m working on concurrent or parallel code or I’m working on designing a distributed system, it’s gotta be a model checker. I’ve verified enough code to know that even a brilliant human cannot find 1-in-a-million programming errors that surface in systems processing millions of transactions a minute. We’re not wired that way. Fortunately we have formal methods. Maths is an excellent language for specifying problems and managing complexity. Induction, category theory, all awesome stuff.
Most importantly though… you have to write the stuff and read it and interact with it to be able to keep it in your head. Programming is theory-building as Naur said.
Personally I just don’t care to read a bunch of code and play, “spot the error;” a game that’s rigged for me to be bad at. It’s much more my speed to write code that obviously has no errors in it because I’ve thought the problem through. Although I struggle with this at times. The struggle is an important part of the process for acquiring new knowledge.
Though I do look forward to algorithms that can find proofs of trivial theorems for me. That would be nice to hand off… although simp does a lot of work like that already. ;)
I think at the same time, while the author says this is the second most impressive technology he's seen in his lifetime, it's still a far cry from the bombastic claims being made by the titans of industry regarding its potential. Not uncommon to see claims here on HN of 10x improvements in productivity, or teams of dozens of people being axed, but nothing in the article or in my experience lines up with that.
For those not in-the-know, I just learned today that code autocomplete is actually called "Fill-in-the-Middle" tasks
Search has been neutral. For finding little facts it’s been about the same as regular search. When digging in, I want comprehensive, dense, reasonably well-written reference documentation. That’s not exactly wide-spread, but LLMs don’t provide this either.
Chat-driven generates too much buggy/incomplete code to be useful, and the chat interface is seriously clunky.
I'd love to be able to tell my (hypothetical smalltalk) tablet to create an app for me, and work interactively, interacting with the app as it gets built...
Ed: I suppose I should just try and see where cloud ai can take smalltalk today:
But I'm completely unconvinced by the final claim that LLM interfaces should be separate from IDE's, and should be their own websites. No thanks.
Claude will often generate tons and tons of useless code quickly using up it's limit. I often find myself yelling at it to stop.
I was just working with it last night.
"Hi Claude, can you add tabs here.": <div>
<MainContent/>
<div/>
Claude will then start generating MainContent.
DeepSeek, despite being free does a much better job than Claude. I don't know if it's smarter, but whatever internal logic it has is much more to the point.
Claude also has a very weird bias towards a handful of UI libraries that has installed, even if those wouldn't be good for your project. I wasted hours on shancn UI which requires a very particular setup to work.
LLM's are generally great at common tasks using a top 5( popularity) language.
Ask it to do something in a Haxe UI library and it'll make up functions that *look* correct.
Overall I like them, they definitely speed things up. I don't think most experienced software engineers have much to worry about for now. But I am really worried about juniors. Why higher a junior engineer, when you can just tell your seniors they need to use Copilot to crank out more code
Most editors I use supports online LLM but it's too slow sometimes for me.
Hot take of the day, I think making tests and refactors easier is going to be revolutionary for code quality.
1) Idea
2) Tests
3) Code until all tests pass
Does webflow have something?
My problem is being able to describe what I want in the style I want.
Properly using them requires understanding that. And just like we understand every query won’t find what we want, neither will every prompt. Iterative refinement is virtually required for nontrivial cases. Automating that process, like eg cursor agent, is very promising.