Why Not Comments
The article emphasizes the significance of "why not" comments in programming, highlighting their role in explaining decision-making and trade-offs that identifiers alone cannot convey, while questioning self-documentation limits.
Read original articleThe article discusses the importance of "why not" comments in programming, emphasizing that while code is structured and limited in expressiveness, comments can convey more nuanced information. The author argues that comments should not only explain what the code does but also highlight the reasoning behind certain decisions, particularly when trade-offs are made. An example from the author's work on "Logic for Programmers" illustrates this point, where a slow but simple solution was chosen for replacing math symbols in an epub build. The comment serves as a reminder of the decision-making process and potential future implications if the codebase grows. The author critiques the idea that all necessary information can be embedded in identifiers, noting that identifiers cannot encapsulate complex trade-offs or negative information. The article concludes by pondering whether "why not" comments represent a broader challenge in self-documentation, suggesting that certain abstract concepts may inherently resist self-documentation.
- "Why not" comments provide context for decision-making in code.
- Comments can highlight trade-offs that identifiers cannot convey.
- The author uses a practical example from their work to illustrate the concept.
- Self-documentation may struggle with conveying negative information.
- The article raises questions about the limits of self-documentation in programming.
Related
Self Documenting Code Is Bullshit
Klaus Breyer challenges self-documenting code, advocating for external documentation to enhance precision and abstraction. He emphasizes the need for detailed information like variable units and invariants, promoting a balanced approach for code clarity.
The Documentation Tradeoff
Kent Beck's article discusses the complexities of software documentation, emphasizing effective communication over excessive documentation. He critiques "self-documenting code" and the neo-waterfall approach, advocating for alternatives like discussions and tests.
Features I'd like to see in future IDEs
Proposed improvements for IDEs include queryable expressions for debugging, coloration of comments, embedding images, a personal code vault for snippets, and commit masks to prevent accidental code commits.
Against Names
The article explores the challenges of naming in computer science, highlighting anonymous identifiers in version control and utility CSS as ways to simplify workflows while balancing named and unnamed elements.
Explicit is better than implicit
The article highlights that explicit coding enhances readability and maintainability, reduces confusion, and improves collaboration by clearly defining variables and access controls, ultimately leading to better code quality.
- Many programmers emphasize the need for comments that explain the reasoning behind non-obvious code choices, particularly in complex situations.
- There is a consensus that comments should focus on the "why" rather than the "what," as the latter can often be inferred from the code itself.
- Some commenters express frustration with excessive or redundant comments, advocating for a balance between clarity and conciseness.
- Several users highlight the risk of comments becoming outdated or misleading, stressing the importance of maintaining them alongside code changes.
- Many agree that comments serve as valuable documentation for future maintainers, helping to clarify decisions made during development.
"A junior engineer writes comments that explain what the code does. A mid-level engineer writes comments that explain why the code does what it does. A senior engineer writes comments that explain why the code isn't written in another way."
(except punchier, of course. I'm not doing the quip justice here)
What's not so useful: mandatory comments. A public API should be thoroughly documented, but some shops insist on writing comments for every function in the code, even private ones and even if its purpose is so obvious that the comment just rephrases its name. This practice is not only a waste of time, but also insensitizes you about comments and teach you to ignore them.
Other wasteful comments are added by some tools. I hate the one that marks every loop wiht a //for or //try comment.
DEAR MAINTAINER:
This code is the way it is because of <reasons go here>.
Once you are done trying to 'fix' this, and have realised what a terrible
mistake that was, please increment the counter as a warning to the next
person:
total_hours_wasted_here = n
I'm not the original author, but have gratefully used it once or twice, and been amused when there was a single line commit incrementing the counter.This especially applies to your own code that you write and still have to maintain 5, 10, 15 years later. Just the other day I was reviewing a coworker's new code and thought "why choose to do it this way?" when the reason was 10 lines up where I did it the same way, 8 years ago. She was following the cardinal rule of maintenance - make the code look like the existing code.
Comment on whatever would be surprising when you read the code.
When I write code, a voice in the back of my head constantly asks “will I understand this code later?”. (People who just instinctively answer ‘yes’ every time are arrogant and often wrong.) Whenever the answer is ‘not sure’, the next obvious question is “why not?”. Answering that question leads you directly to what you need to write in your comment.
Sometimes the answer is “because the reader of the code might wonder why I didn't write it another way”, and that's the special case this article covers. But sometimes the answer is “because it's not obvious how it works or why it's correct” and that clearly requires a different type of comment.
Identifiers can go a _long_ way, but not _all_ the way. I personally am a fan of requiring documentation on any public methods or variables/fields/parameters (using jsdocs/xmldoc/etc). Having a good name for the method is important, but having to write a quick blurb about what it does helps to make it even clearer, and more importantly, points out obvious flaws:
* Often even the first sentence will cause you to realize there's a better name for the method
* If you start using "and" in the description, it is a good indication that the method does too much and can be broken down in a more logical way
People often think properties are so clear they don't need docs, then write things like:
/** The API key */
string ApiKey;
But there's so much missing: where does this key come from? Is this only internal or is it passed from/to external systems? Is this required, and can it be null or empty? Is there a maximum? What happens if a bad value (whatever that is) is used? Is there a spot in code or other docs where I could read more (or all these questions are already answered)?This is stuff that as the original author of the code you know and can write in a minute or two, but as a newcomer -- whether modifying it, using it, or just parachuted in to fix a bug years later -- could take _hours_ to figure out.
Another twist on this is to put in a debug logging statement which triggers when the inputs are much larger than the original design constraints.
It's roughly the same message to a future-developer, but they might find it much sooner, short-circuiting even more diagnostic and debugging time.
I know most people don't like it, and that is fine, they can deal with it! I they don't want to see my comments, they can remove them from their version of my code with a script, and if my co-workers and boss don't like them they can remove them in a code review! However, I can say that I enjoy reading my old code way more than I enjoy reading other's code which have zero comments. I work in Python, so a lot of the simple non-algorithm code (boilerplate stuff for apps, like flask APIs for example) is mostly "self-documenting" since the old saying goes, "write some pseudo-code and 95% of the time it runs in Python." The most important comments are sometimes on the boilerplate stuff because that's where a lot of changes happen versus the algorithms where I find there is a lot more wholesale rewriting in my industry.
I will always love comments and doc comments!
If a piece of code is weird, or slow, or you'd say "yeah, it's kinda janky" when describing to somebody, I usually write a comment about it. Especially if I've changed it before; to document some case that didn't work, or I fixed, or whatever.
When you operate on this basis, superfluous comments just melt away, and you typically end up documenting 'why' only when it's really necessary.
Try it out in your own codebase for a month and see how it feels :)
You can't have functional code for what isn't done, so that's some information you can't express in code.
Furthermore, a major problem with comments is that you can't debug or test them. There are many tools that can analyze code, static or runtime, but because comments are just free text, you can't do much besides spellchecking. Also it is common to forget to update comments, and with the lack of testing, in the end, you get lies.
But here, the only maintenance these comment need is to remove them if they stop being relevant. For example because you finally did the thing you said you wouldn't do, or just scrapped the part. Very low effort, also rather easy to spot, since if you see a thing being done with an explanation on why it is not done, it means someone forgot.
It is also worthwhile because as programs grow, they tend to do more, and "not" assumption may no longer hold (there used to be 4 parameters, now there are 10000...), meaning it is something you should check.
A lot of slow code comes from an assumption that N will be small, and N ends up being big. By the way, that's why I favor using the lowest O(n) algorithm that is reasonable even if it is overkill and slower on small sets. Because one day, it may not be. If for some reason, using a low O(n) algorithm is problematic, for example because the code is too complex or the slowdown on smaller sets too great, then comes the "why not" comment.
Then I realized that my languages will never be perfect, and having comments is an essential escape hatch. I was wrong and I changed my mind.
Also, 99.9% of languages have comments:
I've been in this exact situation quite a few times — use a bad algorithm because your n is low. However, instead of commenting, I did something like this instead:
function doStuff(items: Item[]) {
if (items.length > 50) {
logger.warn("there's too much stuff, this processing is O(n^2)!");
}
// ... do stuff
}
Wow, someone actually suggested that?! Do people write whole programs like this?
My biggest headache right now has been getting high-throughput with SQL and as such I've had to do a lot of non-obvious things with batching and non-blocking IO in Java to get the performance I really need, and as such a of the "obvious" solutions don't work (at least with a reasonable amount of memory). Consequently I've been pretty liberally commenting large segments of my code so that someone doesn't come in and start bitching about how "bad" my code is [1], "fix" it, and then make everything worse by rewriting it in a more naive way that ends up not fulfilling the requirements.
[1] I have since stopped doing this, but I'm certainly guilty of doing this in the past.
There's often a fairly small kernel of very dense code that abstracts away a bunch of complexity. That code tends to have well north of a 1:1 comment to code ratio, discussing invariants, expectations, which corner cases need special handling and which ones are solved through the overall structure, etc.
Then there's a bunch of code that build on that kernel, that is as close to purely declarative as possible, and aims for that "self-documenting code that requires no comments" ideal.
Finally, there's the business logic-y code that just can't be meaningfully abstracted and is sometimes non-obvious. Comments here are much more erratic and often point at JIRA tickets, or other such things.
- URL of documentation for a complex feature
- URL with a dashboard with telemetry
- URL of a monitor which checks if the given feature, CI job, GitHub action etc. works correctly.
In a big project, figuring this stuff out is not trivial, requires a lot of searching with proper search terms and/or asking the proper knowledgeable person.
I find it weird that code is often so detached from everything non-code.
I often add comments to code as I decipher it, then remove them again when I figure things out.
Every function (or procedure) starts with a comment block. It first talks about the what and why. Then, a line for the inputs and another for the outputs. Next -- and this is done closer to the end of the writing -- I describe what it calls and what it is called by. The comment block optionally finishes with room for improvement.
The function itself probably has other comments. Usually for anything which is not blindingly obvious. Because I write code like a caveman, wherein only one thing happens on one line, most everything is quite clear. If there's anything weird or magical that has to happen, it gets a comment.
Elegance and cleverness is reserved for data structures, algorithms, and so on, rather than doing a lot of stuff in as few lines as possible. I do this for Future Me, who might be having a bad day, or for anyone who wants to adapt my code to something else.
One of the last steps in a finished program is going through and making sure that my comments match my code. I am a very boring kind of programmer.
In other words, the comment allows the author to reach into the future and co-debug with the reader, even if the author is no longer there.
People rarely touch what I write, but if they do, and they want to strip the comments out, thats totally fine with me, just don't ask me how it works after you do :P
He did not believe in comments, much. I think he thought he was commenting the hard stuff, mēh, I am unsure.
It has led me into strong opinions.
* Document every function. The function should make clear what the preconditions, preconditions, and the purpose are
* Docunebnt every file/code unit. Why it exists
* Document important loops like functions
* Document the easy stuff. It is not easy if unfamiliar
* Review the comments when working on code
This would have saved my company about thirty percent of my time.
The compiler does not verify comments, like it does code, so it is a burden. Bad programmers get another opportunity to sow chaos, I know. But one of the main purposes of code is communication with following humans, as well as controlling machines
Careful and thoughtful comments are a professional obligation IMO
Doesn't seem to be a problem here though because they're replacing macros by symbols that are known ahead of time.
For instance, you might write something like:
# I used a bubble sort instead of a quick sort here because
# the constraint above this guarantees there will never be
# more than 5 items, so it's faster to use the naive
# algorithm than to implement a more complex algorithm that
# involves more branching.
or # Normally we'd do X, but that broke customer Y's use case
# based on their interpretation of our API docs which we
# had kind of messed up. So now we do Y because it works
# under both interpretations, at least until we can get
# them to upgrade.
Basically, tell your audience why you're not using the expected method. It's not because you didn't know about it, but because you do know and you've determined that it's not a good fit for this use case.I feel like I’m getting off the self-documenting code ride. In our own codebase we rely way too much on “descriptive names”. Like full-on sentence-names. And is the code self-documenting? Often not. You indeed cannot describe three or more axes of concerns in one name.
Do comments go stale? Well why does it? Too loose code reviews? Pull requests that have fifty lines of diff noise that you glaze over? We have the tools to do better on that front than some years ago at least.
It’s a joy to find a corner of the code base where things are documented with regular sentences. Compared to having to puzzle through five function call layers.
[1] But yeah, really. But also: sometimes also in comments. Sometimes both.
def fetch_data(comment_id):
Args:
- comment_id: The id of the comment to fetch.
Returns: The comment data
# Fetch comment datadata = fetchCommentData()
Who is arguing this? Usually, if I'm adding a comment on why something is done a particular way, it's something that is going to take at least a full sentence to explain if not a whole paragraph.
Sure people can tell what the code is doing over a file but comments could definitely add a little more context of what the code is _for_.
It seems only to be a thing with Jupyter notebooks, and even there it mostly describes the results and not the code.
It makes it way easier to send these into Claude (it seems, at least). I hope they introduce a semantic/vibes search too as I can never remember what I name my classes and functions...
The problem with comments is that they can become stale, and it's often possible to self-document or write simpler code that causes less surprise. But of course, it's totally fine to put comments.
And I think comments should be mandatory for interfaces functions/types unless their behavior is obvious. I don't want to read the code to understand what a function does, or what invariant a class maintains. And if it's too complex to document in a few lines, probably this isn't the right interface. But apparently, this isn't obvious for everybody. In my company, most of the code isn't documented.
I nearly exploded trying to grok this
Or you know, you could have just used a hyphen instead of clickbaiting.
I admire the honesty, but will continue to phrase these "why not" comments as insincere TODOs.
When commits are rebased, the log message must be revisited and revised. Changes can disappear on rebasing; e.g. when a change goes into a baseline in which someone else made some of the exact same changes in an earlier commit, so that the delta to the new parent is a smaller patch. In my experience, commit messages stay relevant under most rebasing.
Comments are (largely) an obsolete version of version control log messages.
In the 1980s, there was a transitional practice: write log messages, but interpolate them into the checked out code with the RCS $Log$ thing. This was horrible; it practically begs for merge conflicts. It was understandable why; version control systems were not ubiquitous, let alone decentralized. You were not getting anyone's RCS ",v" file or whatever.
Today, we would be a few decades past all that now. No $Log$ and few comments.
Mainly, the comments that make sense today are ones which drive automatic API documentation. It would not be reasonable to reconstruct that out of the git history. These API comments must be carefully structured so the documentation system can parse them, and must be rigorously maintained up-to-date when the API changes.
Related
Self Documenting Code Is Bullshit
Klaus Breyer challenges self-documenting code, advocating for external documentation to enhance precision and abstraction. He emphasizes the need for detailed information like variable units and invariants, promoting a balanced approach for code clarity.
The Documentation Tradeoff
Kent Beck's article discusses the complexities of software documentation, emphasizing effective communication over excessive documentation. He critiques "self-documenting code" and the neo-waterfall approach, advocating for alternatives like discussions and tests.
Features I'd like to see in future IDEs
Proposed improvements for IDEs include queryable expressions for debugging, coloration of comments, embedding images, a personal code vault for snippets, and commit masks to prevent accidental code commits.
Against Names
The article explores the challenges of naming in computer science, highlighting anonymous identifiers in version control and utility CSS as ways to simplify workflows while balancing named and unnamed elements.
Explicit is better than implicit
The article highlights that explicit coding enhances readability and maintainability, reduces confusion, and improves collaboration by clearly defining variables and access controls, ultimately leading to better code quality.