July 1st, 2024

Code Reviews Do Find Bugs

Code reviews are effective in finding bugs, despite past doubts. Research shows reviewers identify defects, with small code chunks being most efficient. Reviews aid learning, improve maintainability, and promote knowledge sharing.

Read original articleLink Icon
Code Reviews Do Find Bugs

Code reviews are effective in finding bugs despite a 2015 Microsoft research paper suggesting otherwise. The study found that reviewers do identify defects, with an average of four comments per review pointing out bugs. Additional research indicates that code reviews and pair programming can uncover 60% more defects with only a 15% increase in time investment. Reviewers are most effective when examining small code chunks, detecting roughly one defect every ten minutes within the first 60 minutes of review. Furthermore, code reviews aid in learning the codebase, with reviewers becoming more proficient over time. While the Microsoft paper questions the value of code reviews, it acknowledges their benefits in improving long-term code maintainability and promoting knowledge sharing. Despite concerns about review turnaround times, the overall consensus is that properly managed code reviews are highly effective in identifying defects and enhancing team knowledge. The paper suggests that rather than dismissing best practices, organizations should ensure they are implemented correctly before evaluating their effectiveness.

Related

Software Engineering Practices (2022)

Software Engineering Practices (2022)

Gergely Orosz sparked a Twitter discussion on software engineering practices. Simon Willison elaborated on key practices in a blog post, emphasizing documentation, test data creation, database migrations, templates, code formatting, environment setup automation, and preview environments. Willison highlights the productivity and quality benefits of investing in these practices and recommends tools like Docker, Gitpod, and Codespaces for implementation.

Getting 100% code coverage doesn't eliminate bugs

Getting 100% code coverage doesn't eliminate bugs

Achieving 100% code coverage doesn't ensure bug-free software. A blog post illustrates this with a critical bug missed despite full coverage, leading to a rocket explosion. It suggests alternative approaches and a 20% coverage minimum.

Misconceptions about loops in C

Misconceptions about loops in C

The paper emphasizes loop analysis in program tools, addressing challenges during transition to production. Late-discovered bugs stress the need for accurate analysis. Examples and references aid developers in improving software verification.

Readability: Google's Temple to Engineering Excellence

Readability: Google's Temple to Engineering Excellence

Google's strict readability process involves code approval by maintainers and readability mentors, shaping coding standards. Despite criticism, it enhances skills, maintains quality, and fosters global code consistency. A proposed "Readability Lite" aims for mentorship and quality without strict enforcement.

A Bunch of Programming Advice I'd Give to Myself 15 Years Ago

A Bunch of Programming Advice I'd Give to Myself 15 Years Ago

Marcus offers programming advice emphasizing prompt issue resolution, balancing speed and quality, improving tool proficiency, deep bug investigations, leveraging version control, seeking feedback, and enhancing team collaboration for optimal problem-solving.

Link Icon 31 comments
By @willio58 - 7 months
Agreed. I mainly manage and review code at this point in my career. I find many bugs, every once in a while finding something that would have caused an outage or notable problem for users.

What I find more though is code that isn't thought through. Tech debt and code smell are real, and they affect the performance of a team. Nipping that in the bud takes quality PR reviews and time to meet with submitters around issues you find.

Knock on wood but working at the company I do now where I, along with my team, have made quality PR reviews normal.. our codebase is now enjoyable and fun to work on. I highly recommend it!

One key aspect is being “kind, not nice”. Be helpful when leaving comments in PRs, but don’t be nice for the sake of avoiding conflict.

Also if you find code reviews to be a waste of time I can reccomend one thing I do often - give warnings. I approve and give comments around things I’d like to be fixed in the future for similar PRs. I don’t hold up the merge for little things, but at the same time I won’t let the little things slide forever

By @godelski - 7 months
I think this is really important in that it is bigger than "code reviews." It does show how people greatly misunderstand statistics[0]. And what's even funny is at surface level the claim that code review "does nothing" __sounds__ ludicrous. But people "believe" because they are annoyed with code review, not because they "actually" believe the results.

But statistics are tricky. With the example given in the article "15% of smokers get lung cancer" compared to "80% of people with lung cancer smoke." These two are not in contradiction with one another but are just different ways to view the same thing. In fact, this is often how people will mislead you (or how you may unintentionally mislead yourself!) with statistics.

Another famous example is one that hits HN every once in awhile: "Despite just 5.8% sales, over 38% of bug reports come from the Linux community"[1]. In short this one is about how linux users are just more trained to make bug reports and how most bugs are not system specific. So if you just classify bugs by the architecture of those submitting them, you'll actually miss out on a lot of valuable information. And because how statistics work, if the architecture dependence rate was as low as even 50% (I'd be surprised!) then that's still a huge amount of useful bug reports. As a linux user, I've seen these types of bugs, and they aren't uncommon. But I've frequently seen them dismissed because I report from a linux system. Or worse, support sends you to their page that requests you to "upvote" a "feature" or bug issue. One you have to login to. I can't take a company like that seriously but hell, Spotify did that to me and I've sent them the line of code that was wrong. And Netflix did it to me saying "We don't block firefox" but switching user agents gave me access. Sometimes we got to just think a bit more than surface level.

So I guess I wanted to say, there's a general lesson here that can be abstracted out.

[0] Everyone jokes that stats are made up, but this is equally bad.

[1] https://news.ycombinator.com/item?id=38392931

By @poikroequ - 7 months
The value of code reviews really depends on the code and the person working on the code. For a team who have spent years working on the same repo, code reviews may not hold much value. But if you have a new guy on the team, or a junior, you'll definitely want to review their code.

Code reviews can also do more than just find bugs. You can point out a better way of doing things. Maybe this SQL could be more efficient. Maybe you can refactor some bit of code to make it more robust. Maybe you should put a logging statement here. This method name is confusing, may I suggest renaming it to xyz?

By @jt2190 - 7 months
I’m not sure why the author ignores the “… that should block a submisson” part.

The abstract of the paper:

> Because of its many uses and benefits, code reviews are a standard part of the modern software engineering workflow. Since they require involvement of people, code reviewing is often the longest part of the code integration activities. Using experience gained at Microsoft and with support of data, we posit (1) that code reviews often do not find functionality issues that should block a code submission; (2) that effective code reviews should be performed by people with specific set of skills; and (3) that the social aspect of code reviews cannot be ignored. We find that we need to be more sophisticated with our guidelines for the code review workflow. We show how our findings from code reviewing practice influence our code review tools at Microsoft. Finally, we assert that, due to its costs, code reviewing practice is a topic deserving to be better understood, systematized and applied to software engineering workflow with more precision than the best practice currently prescribes.

“Code Reviews Do Not Find Bugs: How the Current Code Review Best Practice Slows Us Down”

https://www.microsoft.com/en-us/research/wp-content/uploads/...

By @topkai22 - 7 months
Code reviews don’t just find bugs, they prevent them from being introduced in the first place.

Developers are more careful about what they write and submit when they know they’ll have someone else looking at it.

We went through a couple iterations of our code review policy on a multi-year project a while back. We never really saw code reviews catch a substantial number of bugs over time, but whenever we pulled back on code reviews we definitely saw the production error rate go up.

By @epolanski - 7 months
My beef with code reviews is that often they lead to tremendous amounts of wasted time, that's many thousands spent in a single week sometimes for simple pull requests.

Working from 6 years, not much, and not in as many places like others, I have built the opinion that code reviews are like tests, they should be used as a tool when they are necessary, they shouldn't be the default for every change.

In best case scenarios is the person creating the pull requests that requests reviews or decides the place needs tests.

My opinion obviously applies to product software, for libraries, especially when publicly exposed you want as much discussions and tests as possible.

By @swatcoder - 7 months
As with most processes, the dilemma with code reviews is in figuring out how they impact your team and your organization.

In a huge org, with thousands of engineers that's already burdened by hours per day of interruptions and process overhead, and release runways that already involve six stamps of bureaucracy, mandatory code revies have very little downside (it's in the noise) but highly variable return (many people are just droning under the weight of process). The org loses nothing much for mandating it, but only certain teams will see a lot of value for it.

On the other extreme, a startup with five engineers will get backlogged with reviews (which then get shortchanged) because everbody either is under pressure to either stay in their high-productivity flow or put out some pressing fire. The reviews probably could catch issues and share critical knowledge very regularly, but the org pays a pronounced penalty for the overhead and interruptions.

People long for "one size fits all" rules, crafting essays and publishing research papers to justify them, but the reality of what's right is often far more idiosyncratic.

By @david2ndaccount - 7 months
In my experience, code reviews catch a lot of bugs. However, if you find yourself catching the same kind of bugs over and over again in review you should be finding ways to catch them automatically without involving a reviewer (static analysis, tests, linters, etc.)
By @rebeccaskinner - 7 months
I think the article is taking the wrong view. The statistic cited by the article that 15% of comments were about a bug seems in line with expectations, and I think it would only really be worth discussing if the number were _much higher_ or _much lower_.

Instead, I think there are two far more interesting questions to ask:

1. Is the rate at which code review identifies defects sufficient to use code review as a detection mechanism for defects?

After nearly 20 years of writing software, I'm pretty convinced that the answer here is no. Some reviewers are better than others, and some circumstances are more favorable to finding defects than others, but we should generally try to build processes that don't assume defects will be caught at a substantial rate by code review. It's nice when it works, but it's not a reliable enough way to catch errors to be a load bearing part of the process.

2. Is mandatory review of all code justified?

This is the one I'm on the fence about. In an environment where code reviews are high priority, people are trained to review effectively, and there are minimal organizational politics at play, then I hypothesize that allowing PR authors to decide whether to get a review or not would generally improve quality and velocity because code would ship more quickly and code that would benefit from a review would still be reviewed. In that scenario, I think we'd see the benefits of getting things shipped more quickly when they don't require a review, and reviews would be higher quality because code being flagged for review would be a positive sign to pay more attention.

Unfortunately, I could be wrong, and it's not the sort of experiment anyone wants to risk their reputation pushing for, so I doubt we're likely to see an experiment at a large enough scale to know for sure. If we're going to fail one way or another, I'd prefer to fail by doing too much code review rather than not enough.

By @bluGill - 7 months
Bugs are 'easy' to fix, I don't worry about finding them. I worry about the interfaces as they quickly become a nightmare to change just because of all the users.
By @29athrowaway - 7 months
If you have a spellchecker, code formatter and a linter, code reviews improve significantly. Much better than having to do that work by hand, or reviewing it by hand, leaving code reviews for higher level ideas.
By @BlackFly - 7 months
Bear in mind I am pro code review, but...

There is a trick in pharmaceutical research where you test a potential candidate drug against placebo to yield a bad study that seems to show benefit. The reason it is a trick is because in many cases the alternative isn't placebo, it is an existing treatment. Then doctors learn about a more "modern" treatment, favor it for being modern and the better treatment may not be prescribed.

The alternatives to code review aren't doing nothing. The article claims that code reviews find a defect per 10 minutes--but only in the first ten minutes. By this same argument (ignore qualifications, extrapolate the numeric result), fast automated testing can potentially find thousands of defects in a second--if they run that quickly and the defects were already tested for. Static analysers, pair programming, documentation these are all alternatives and there are many more.

If you're spending an hour a day reviewing code then you are spending 12.5% of your time doing it. Using it that way comes with an opportunity cost that may be better spent depending on your particular organization and code base. Of course, analysing everything to death also has an opportunity cost, but not analysing it generally leads to moving goal posts where the supposed rationale for doing something keeps changing. Today its purpose is uncovering defects, tomorrow it is knowledge sharing, the day after it is security. It is all of those things, but other practices may achieve these goals with more effective use of time and people's patience.

So why am I pro code review? Because choosing to interact and work together as a team, to learn about and compromise with your colleagues makes for good team building while serving other technical purposes. I do think that pair programming can achieve this to a greater level while also being more taxing on individuals. This assumes you control the process and own it though, if it has just become a rote ceremony then my feelings are you probably aren't net benefitting from it: you are simply doing it because you have no choice, not because you believe it to be a valuable use of time. If you have experienced both, a culture where people choose and find value in code reviews and a culture where people are forced to do it unquestioningly, then you may have witnessed how a dicta can destroy the prosocial value of a practice.

By @nitwit005 - 7 months
> Developers spend six hours per week reviewing. This is a bit too much

It's extremely difficult to adjust the time spent on reviews. The options are unattractive. Do you start blindly accepting changes when you hit the limit, or just stop and not let people merge code?

By @some_furry - 7 months
> During the first 60 minutes of code review of the day, the reviewer finds roughly one defect per ten minutes of reviewing – as long as they review less than about 50 lines of code per ten minutes.

Oh.

It normally takes me a few seconds to find bugs in code.

I always felt this was average performance for assessing software. If the average time is ten minutes per defect, I need to recalibrate my expectations for myself.

By @grumple - 7 months
I find things wrong with virtually every nontrivial pull request when I’m the reviewer. Sometimes these are minor issues, but I spot bugs and edge cases all the time.

I see some comments about time. How long does a code review take? I can review hundreds of lines of code in a few minutes. It is much easier to review code than to write code imo, especially as you gain experience. For bigger efforts, get eyes on it throughout the process.

I’ve met a lot of developers who assume their code will just work right after they write it. They don’t test it, via code or manual qa. Then they act surprised when the stakeholder tells them it doesn’t work. Do the job right the first time. Slow is smooth and smooth is fast.

By @mgreene - 7 months
The paper's title is a bit provocative but I think the findings are interesting. Mainly around long-held beliefs about what developers perceive as the value vs what is actually happening.

You do bring up a good point about using change defect rate though. I wish the researchers had cited that as the preferred unit of measurement. I did some research on change defect rates on popular open source projects and it's all over the map. Ranging from ~12 - ~40% [1].

The future I'd like to see is as developers we use objective measures to justify time investment for review. This is going to be increasingly important as agents start banging out small bug-fix tickets.

[1] https://www.shepherdly.io/post/benchmarking-risk-quality-kpi...

By @sarchertech - 7 months
I remember a time before you needed an approval to merge a PR (I also remember a time before PRs or any widespread version control system).

I can count on one hand the number of times someone has caught a bug in my code that should have stopped deployment. Not that I haven’t deployed serious bugs to deployment, but they’ve almost never been caught by someone reviewing my code.

Occasionally someone suggests a better way to do something, or asks a question that ends up with me coming up with a better way of doing something. But those situations are also rare. And I can’t think many times at all when the impact was worth the time spent on the process.

Pair programming and collaboration can be hugely beneficial, but the minimal effort PR approval culture we’ve developed is a very poor substitute.

By @sys_64738 - 7 months
There are various levels to code reviews. Code review tools that are web based are pretty poor in my experience. Anything more than a few lines across multiple files needs a cscope type tool.

Also what type of review? Is this a prototype needing a high level design review so that the actual review doesn’t turn into a design review? How often does that occur?

Who are the reviewers and what’s the process? Key stakeholders have more influence and you need to consider the reviewer’s experience, knowledge and credibility.

Finally how important is the code? Is it kernel code, or high execution daemon code needing race condition and memory leak checking? Are you using static analysis for the code? Does the code even compile and do what it is designed to do? Where are the unit test logs?

Lots to consider.

By @andirk - 7 months
Technical debt. Keep it minimal, and when needed, write a task for it to be looked in to later.

Coding standards. Don't submit code that has rando extra lines and things that will slow down the next dev from looking in to the past to learn what does what.

And most of all, make sure edge cases are covered, often via a truth table of all possible outcomes.

I often comment on a PR saying "blah blah, but not blocking" so I'll allow it but at least my entitled opinion was known in case that code issue comes up later.

My PRs take a long time, because I dig the F in.

By @robertclaus - 7 months
I find many standard processes can be described similarly - if you're mindful of what problem they solve, they should be incredibly useful. The tricks are the important but subtle details like not spending 2 hours straight reviewing an excessively long PR. Those are easy to forget once it's just part of the process.
By @OpenDrapery - 7 months
I like to see working software alongside a PR. GitLab has the concept of a Review Environment, and it spoiled me. Ephemeral, dynamically provisioned environments that are deployed from a feature branch, are absolutely amazing productivity boosts.

It gives so much more context to a PR.

By @jonobird1 - 7 months
I'm not sure code reviews hold much merit. I've been a web developer for around 12 years and I've worked in companies big and small.

I think there should be a manual QA process to test the functionality of what the developer is pushing out.

The issue with code reviews is always that they take so much time for another developer and many devs are super busy so they just have a quick review of the PR and approve or feel they have to add some comments. Context switching between what the dev is already doing and having to come to the PR to review properly means they should switch to that Git branch, pull down the code, test it all and check for logical bugs that a static code review won't pick up.

For juniors, code reviews are still useful as you will be able to spot poor quality code, but for seniors, not as much for the reasons above, better off having a QA process to find and logic holes rather than expecting devs to invest so much time in context switching.

By @dakiol - 7 months
The only thing I don’t like about code reviews are nitpick comments. Everyone has their own subjective way of writing code, if my code works and looks good enough, let it be.
By @alex_lav - 7 months
Code reviews can find bugs.

More often, code reviews become opportunities for team members to bikeshed. Worse, an opportunity for a non-team member to exert power over a project.

By @Mathnerd314 - 7 months
My question is, do human reviewers find more bugs than ChatGPT? Because finding a cofounder costs a lot but asking ChatGPT is free. https://www.thatsoftwaredude.com/content/12848/chatgpt-can-d... says it is mediocre, but that was a year ago and honestly mediocre code reviews seem sufficient.
By @sriharshamur - 7 months
What are some amazing blogs/resources to read to learn how to review PRs?
By @bigcat12345678 - 7 months
Hah? Code review of cuz finds bugs... It's like people do see...
By @banish-m4 - 7 months
Eliminating bugs requires sustained, vigilant, holistic, overlapping approaches:

- Code reviews prior to acceptance of commits (Facebook does this)

- Refactoring crap that manages to get through

- Removing features

- Higher-order languages with less code

- Removal of tech debt

- More eyeballs

- Wiser engineers

- Dedicating more time to better engineering

- Coding guidelines that optimize for straightforward code while not being so strict as to hinder strategic exceptions

- Negative LoC as a KPI