Nonsense and Malicious Packages: LLM Hallucinations in Code Generation

The goal of generative AI tools, powered by large language models (LLMs), is to finish the task assigned to them; to provide a complete response to a prompt. As is now well-established, models sometimes make things up, or hallucinate, to achieve this. In natural language outputs, hallucinations have degrees of seriousness—minimal in shopping lists, possibly consequential in scientific texts. In code generation, hallucinations are easier to spot and the consequences are clear: the code doesn’t work, either properly or at all.

AI is already upending how developers work across multiple programing languages. According to Google’s 2024 The State of DevOps Report, 74.9% of survey respondents said they are already using AI in software development. Yet the thorny issue of hallucinations remains.

Researchers cognizant of the impact of hallucinations—from time wasted fixing errors to cybersecurity threats—are now devising mitigation methods.

Not Knowing Is Not an Option

“Models are trained to always say something. They are rarely trained to say, ‘Oh, I don’t know,’ which is what a human would sometimes do,” said Michael Pradel, a computer science professor and software engineering expert at the University of Stuttgart in Germany. “If the AI is suggesting code that is nonsense and doesn’t refer to the actual code base or libraries they are working with, it’s not super-helpful.”

Pradel said that when a model starts hallucinating, it can take a developer more time to fix the generated code than it would to write it manually. Companies that work on private code bases are especially vulnerable, he said, as popular LLMs trained on public data mined from online sources have not seen their code. He explained, “The chance to get hallucinations is much, much bigger because the models don’t know the facts about their code base.”

While companies with sufficient resources may develop their own models trained on their own code, for others “off the shelf” is the only option, said Pradel. “These off-the-shelf models just don’t know the code base of a small or medium-sized company.”

Agentic approaches, where LLM-based agents seek out and call APIs, code bases, and documentation, can help mitigate hallucinations. With co-authors Islem Bouzenia, also of the University of Stuttgart, and Premkumar Devanbu of the University of California Davis, Pradel developed RepairAgent, an autonomous agent that automatically fixes bugs by modifying code. “We do this is by letting the LLM actively call different tools, like code search tools— which has the nice benefit of also handling hallucinations,” Pradel said.

In another approach, Pradel and Ph.D. researcher Aryaz Eghbali have presented De-Hallucinator, a technique for mitigating LLM hallucinations within project-specific APIs by combining relevant API references with iterative grounding. “The idea of De-Hallucinator is to use this bad feature of LLMs to our advantage,” said Pradel.

The method exploits the fact that hallucinations often sound credible. When a model hallucinates an API, De-Hallucinator automatically identifies existing, project-specific APIs with similar names; it then iteratively augments the prompt with that name. Said Pradel, “We give this as additional context, we inject a little bit of project-specific knowledge into the prompt based on what the LLM has suggested in the first round.”

Pradel and Eghbali evaluated De-Hallucinator on code completion and test generation tasks using Python and JavaScript, and concluded that the technique “significantly improves the quality of generations over the state-of-the-art baselines.”

Coming to Grips with the Issue

Setting out to build community understanding of the challenges code hallucinations pose, a collaboration of U.S., Chinese, and Japanese researchers have presented a method of classifying them using execution verification. The researchers defined four main categories of hallucination: mapping, naming, resource, and logic, each of which has sub-categories. They have also developed a publicly available algorithm, CodeHalu, for detecting and quantifying code hallucinations, and CodeHaluEval, a benchmark for evaluating them.

Meanwhile, a team of researchers from Beihang University in Beijing, Shandong University in Qingdao, and Huawei Cloud Computing Technologies have used open coding and iterative refinements to develop a taxonomy of hallucinations that includes categories and sub-categories such as knowledge conflicts, inconsistency, repetition, and dead code. The work also features a benchmark, named HALLUCODE, for evaluating LLMs’ performance in recognizing code hallucinations.

An Unwanted Package

Hallucinations may also pose a viable threat to cybersecurity, according to a team of U.S.-based researchers. A team of researchers from the University of Texas at San Antonio, the University of Oklahoma, and Virginia Tech recently posited that package hallucinations—which occur when a model generates code that recommends or references a package that doesn’t exist—are ripe for exploitation by malicious attackers.

Code generation has structures that models must follow to generate functioning code, but that makes it easier for models to “pin themselves in a corner,” said lead author and Ph.D. cybersecurity researcher Joseph Spracklen. He explained that as there is a finite list of packages that exist, “It’s possible for the model to choose tokens in such a way that suddenly it says, ‘Oh wait, stop, there’s nothing I can pick from at this point that would lead to a valid package.’” Unable to express uncertainty, models are then forced to hallucinate package names to complete their task.

Spracklen and the other researchers analyzed 576,000 Python and JavaScript code samples generated by 16 commercial and open-source LLMs, finding an average of 5.2% hallucinated packages for the former and 21.7% for the latter. Said Spracklen, “Not only are these hallucinations prevalent, but they are persistent, and they are prone to being regenerated multiple times.”

The evidence of repetition in hallucinated names is particularly ominous for raising the chances of so-called “package confusion attacks.” According to Spracklen, once malicious actors have identified package hallucinations—perhaps by parsing for package names in LLM output and looking for those that do not exist in reality—they can register a package “laced with malicious code” under the hallucinated name on public repositories, such as PyPI [for Python] or npm [for JavaScipt]. Said Spracklen, “It’s very straightforward because these are open-source repositories, they’re anonymous.” Then the attacker would “just sit back and wait,” he continued.

Compromises occur when a developer, using an LLM as part of their normal workflow, unwittingly downloads the malicious package. “Similar to a phishing attack, they’re exploiting the trust that a user has in the model, and they are relying on the user not to do their due diligence,” Spracklen said.

Deploying Retrieval Augmented Generation (RAG) strategies, such as having an LLM check whether a package is on a master list, is not sufficient mitigation, said Spracklen. “The insidious thing about this vulnerability is that once a malicious actor publishes a malicious package, now that package is on the master list.” Instead, Spracklen proposes combining RAG with curated package lists and fine-tuning techniques. “You fine-tune your LLM based off of known good responses that it’s given in the past and known good packages,” he said.

Hallucinations are increasingly viewed as a feature of LLMs rather than a bug, and they appear—so far—to be baked in. However, users are adapting. Just as some content producers have pivoted towards fact-checking generated text rather than creating their own, some developers may find themselves increasingly playing the role of hallucination-fixer.

Karen Emslie is a location-independent freelance journalist and essayist.

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签