March 2nd, 2025

Hallucinations in code are the least dangerous form of LLM mistakes

Hallucinations in code from large language models are less harmful than in prose. Manual testing is essential, and developers should engage with and review LLM-generated code to enhance their skills.

Read original articleLink Icon
Hallucinations in code are the least dangerous form of LLM mistakes

Hallucinations in code generated by large language models (LLMs) are a common issue that can undermine developers' confidence in these tools. However, these hallucinations, which often involve the creation of non-existent methods or libraries, are considered the least harmful mistakes made by LLMs. Unlike errors in prose, which require careful scrutiny to identify, coding errors from LLMs are usually evident when the code is executed, allowing developers to quickly address them. The author emphasizes the importance of manual testing and code execution, arguing that trusting code without verification can lead to significant issues. Developers are encouraged to actively engage with the code produced by LLMs, as this practice enhances their skills in reviewing and understanding code. To mitigate hallucinations, the author suggests trying different models, utilizing context effectively, and opting for well-established technologies. The piece concludes by highlighting that those who find reviewing LLM-generated code tedious may need to improve their code review skills.

- Hallucinations in code are less dangerous than those in prose.

- Manual testing of LLM-generated code is essential for ensuring correctness.

- Developers should actively engage with and review code produced by LLMs.

- Using different models and established libraries can reduce hallucinations.

- Improving code review skills is crucial for effectively using LLMs in programming.

Link Icon 3 comments
By @Terr_ - 2 months
As much as I've agreed with the author's other posts/takes, I find myself resisting this one:

> I'll finish this rant with a related observation: I keep seeing people say “if I have to review every line of code an LLM writes, it would have been faster to write it myself!”

> Those people are loudly declaring that they have under-invested in the crucial skills of reading, understanding and reviewing code written by other people.

No, that does not follow.

1. Reviewing depends on what you know about the expertise (and trust) of the person writing it. Spending most of your day reviewing code written by familiar human co-workers is very different from the same time reviewing anonymous contributions.

2. Reviews are not just about the code's potential mechanics, but inferring and comparing the intent and approach of the writer. For LLMs, that ranges between non-existent and schizoid, and writing it yourself skips that cost.

3. Motivation is important, for some developers that means learning, understanding and creating. Not wanting to do code reviews all day doesn't mean you're bad at them. Also, reviewing an LLM's code has no social aspect.

However you do it, somebody else should still be reviewing the change afterwards.

By @eternityforest - 2 months
It's exactly the same problem with human written code. To me it seems like it's not an LLM problem, it's a lack of testing and review problem.
By @sonorous_sub - 2 months
You have to make sure the machine is hypnotized correctly, or otherwise it can hallucinate on you.