March 2nd, 2025

Hallucinations in code are the least dangerous form of LLM mistakes

Hallucinations in code from large language models are less harmful than in prose. Manual testing is essential, and developers should engage with and review LLM-generated code to enhance their skills.

Read original article

Hallucinations in code are the least dangerous form of LLM mistakes

Hallucinations in code generated by large language models (LLMs) are a common issue that can undermine developers' confidence in these tools. However, these hallucinations, which often involve the creation of non-existent methods or libraries, are considered the least harmful mistakes made by LLMs. Unlike errors in prose, which require careful scrutiny to identify, coding errors from LLMs are usually evident when the code is executed, allowing developers to quickly address them. The author emphasizes the importance of manual testing and code execution, arguing that trusting code without verification can lead to significant issues. Developers are encouraged to actively engage with the code produced by LLMs, as this practice enhances their skills in reviewing and understanding code. To mitigate hallucinations, the author suggests trying different models, utilizing context effectively, and opting for well-established technologies. The piece concludes by highlighting that those who find reviewing LLM-generated code tedious may need to improve their code review skills.

- Hallucinations in code are less dangerous than those in prose.

- Manual testing of LLM-generated code is essential for ensuring correctness.

- Developers should actively engage with and review code produced by LLMs.

- Using different models and established libraries can reduce hallucinations.

- Improving code review skills is crucial for effectively using LLMs in programming.

Overcoming the Limits of Large Language Models

Large language models (LLMs) like chatbots face challenges such as hallucinations, lack of confidence estimates, and citations. MIT researchers suggest strategies like curated training data and diverse worldviews to enhance LLM performance.

LLMs Will Always Hallucinate, and We Need to Live with This

The paper by Sourav Banerjee and colleagues argues that hallucinations in Large Language Models are inherent and unavoidable, rooted in computational theory, and cannot be fully eliminated by improvements.

A Comprehensive Analysis of Package Hallucinations by Code Generating LLMs

The paper analyzes package hallucinations in code-generating LLMs, revealing a 5.2% rate in commercial models and 21.7% in open-source models, urging the research community to address this issue.

Internal representations of LLMs encode information about truthfulness

The study examines hallucinations in large language models, revealing that their internal states contain truthfulness information that can enhance error detection, though this encoding is complex and dataset-specific.

AI hallucinations: Why LLMs make things up (and how to fix it)

AI hallucinations in large language models can cause misinformation and ethical issues. A three-layer defense strategy and techniques like chain-of-thought prompting aim to enhance output reliability and trustworthiness.

3 comments

By @Terr_ - 2 months

As much as I've agreed with the author's other posts/takes, I find myself resisting this one:

> I'll finish this rant with a related observation: I keep seeing people say “if I have to review every line of code an LLM writes, it would have been faster to write it myself!”

> Those people are loudly declaring that they have under-invested in the crucial skills of reading, understanding and reviewing code written by other people.

No, that does not follow.

1. Reviewing depends on what you know about the expertise (and trust) of the person writing it. Spending most of your day reviewing code written by familiar human co-workers is very different from the same time reviewing anonymous contributions.

2. Reviews are not just about the code's potential mechanics, but inferring and comparing the intent and approach of the writer. For LLMs, that ranges between non-existent and schizoid, and writing it yourself skips that cost.

3. Motivation is important, for some developers that means learning, understanding and creating. Not wanting to do code reviews all day doesn't mean you're bad at them. Also, reviewing an LLM's code has no social aspect.

However you do it, somebody else should still be reviewing the change afterwards.

By @eternityforest - 2 months

It's exactly the same problem with human written code. To me it seems like it's not an LLM problem, it's a lack of testing and review problem.

By @sonorous_sub - 2 months

You have to make sure the machine is hypnotized correctly, or otherwise it can hallucinate on you.

Overcoming the Limits of Large Language Models

LLMs Will Always Hallucinate, and We Need to Live with This

A Comprehensive Analysis of Package Hallucinations by Code Generating LLMs

The paper analyzes package hallucinations in code-generating LLMs, revealing a 5.2% rate in commercial models and 21.7% in open-source models, urging the research community to address this issue.

Hallucinations in code are the least dangerous form of LLM mistakes

Related

Overcoming the Limits of Large Language Models

LLMs Will Always Hallucinate, and We Need to Live with This

A Comprehensive Analysis of Package Hallucinations by Code Generating LLMs

Internal representations of LLMs encode information about truthfulness

AI hallucinations: Why LLMs make things up (and how to fix it)

Related

Overcoming the Limits of Large Language Models

LLMs Will Always Hallucinate, and We Need to Live with This

A Comprehensive Analysis of Package Hallucinations by Code Generating LLMs

Internal representations of LLMs encode information about truthfulness

AI hallucinations: Why LLMs make things up (and how to fix it)