Communications of the ACM - Artificial Intelligence 02月27日
Self-Correction in Large Language Models
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

大型语言模型(LLM)在各种任务中表现出色,但同时也存在一些错误。自我纠正是一种改进LLM回复的方法,它借鉴了人类自我纠正的方式。研究表明,LLM在自我纠正方面存在局限性,尤其是在依赖自身反馈时。LLM常常难以检测到自身错误,并且可能存在“自恋”倾向,即偏爱自己生成的内容。然而,通过使用外部工具、在专门数据集上进行微调,以及利用人类标注数据或强化学习,可以提高LLM的自我纠正能力。未来的研究方向包括探索使用合成数据集进行监督微调,以及深入理解LLM自我纠正过程中的参数变化。

🤔大型语言模型(LLM)的自我纠正能力是当前研究的热点,旨在提升其在推理、事实核查等任务中的准确性。研究人员正探索各种方法,使LLM能够像人类一样,通过反思和迭代来修正初始答案。

🛠️现有的自我纠正方法包括利用模型自身评估输出准确性,以及借助搜索引擎等外部工具进行事实核查。改进可以在训练阶段通过微调实现,也可以在答案生成过程中或之后进行。研究发现,使用外部工具和在专门数据集上进行微调能显著提升LLM的自我纠正效果。

⚠️研究表明,LLM在自我纠正过程中存在“自我偏见”,即倾向于认为自己生成的内容是正确的,这会阻碍其优化答案。此外,LLM在检测自身错误方面存在瓶颈,尤其是在需要复杂推理的任务中。因此,提高LLM的错误检测能力是提升自我纠正效果的关键。

The latest generation of chatbots includes valuable tools to help with a variety of tasks, from providing information about a topic to generating computer code. Yet despite their impressive abilities, anyone who uses them regularly will have noticed that they make various types of mistakes. A recent round-up of the failures of OpenAI’s ChatGPT groups its mistakes into 10 different categories, some examples demonstrating difficulty in reasoning, such as working out the sequence of events in a simple story, multiplying large numbers, or solving riddles.

Self-correction is one approach that could improve the responses generated by the large language models (LLMs) that power chatbots. Although it is largely of interest to fix reasoning errors so far, self-correction also is being investigated for other tasks. It takes inspiration from the way humans correct themselves. 

“When we solve a problem, we might first get an initial solution, and then we try to get some feedback either by self-reflection or from some other sources and then revise our initial answer iteratively until we get the right solution,” said Liangming Pan, an assistant professor at the University of Arizona whose research focuses on how to build LLMs that are logical, truthful, and safe. “People are interested in whether large language models have a similar ability.”

Different self-correction approaches already are being used in LLMs. The first step is typically to get feedback. While models are sometimes harnessed to evaluate the accuracy of their own output, external tools such as using a search engine to check facts are utilized in other approaches. Responses can be improved at different stages of the process: during training by using specialized fine-tuning methods, while an answer is being generated, or afterwards.

Many papers cite specific cases in which an LLM has been able to improve itself, but it is not clear how effective self-correction is overall. Some researchers are now taking a closer look at the big picture. “The common belief was that large language models can correct their own mistakes by themselves,” says Pan.

Ryo Kamoi, a Ph.D. student at Penn State University, and his colleagues were keen to survey existing research on self-correction, since they had come across contradictory findings. Some recent papers suggest it is difficult for LLMs to detect their own mistakes, while others claim they are good at correcting themselves and can do so without using external tools. 

After taking a closer look, the team found a simple or suboptimal prompt was used in many papers where LLMs appeared to be able to fix their mistakes. This resulted in mediocre results that were easy to improve upon. In other cases, favorable results were not generalizable, due to the nature of the task.

“[LLMs] could only do self-correction on very specific tasks,” said Rui Zhang, an assistant professor at Penn State University and one of Kamoi’s co-authors.

Kamoi added that for an LLM to be able to correct its mistakes, it must first be able to detect an error, which is the bottleneck in the process. The tasks that were described as success stories in research papers often had an obvious answer, which made it easy to gauge if it was wrong.

The team also found that using external tools typically helped LLMs fix their mistakes. They also performed better when they were fine-tuned on datasets specifically designed for self-correction during training.  

Kamoi said that using human-annotated data as feedback has also been proven to help with self-correction. However, it often is not a viable approach since it is time-consuming and costly. A popular alternative is to use reinforcement learning (RL), a paradigm centered on improving from self-generated feedback using trial and error. For example, researchers from Google Deepmind recently developed a high-performing, two-stage approach called SCoRE that uses RL and a reward method to guide a model to self-correct effectively.

In follow-up work, Kamoi and his colleagues are exploring a novel avenue for LLM self-correction that involves supervised fine-tuning using synthetic datasets.

“The idea is that if we target specific tasks where we can detect mistakes in large language models relatively easily, we can automatically create datasets with error annotations on the responses of large language models,” said Kamoi. “We are just starting a project to explore whether we can improve the error detection performance of large language models on general tasks by training large language models on [these datasets].”

Pan and his colleagues also examined studies related to LLM self-correction in a survey paper published earlier this year focusing on recent approaches that use automated feedback. Similar to Kamoi’s team’s findings, they found there were more examples of LLMs not being able to self-correct when relying on their own feedback, compared to success stories.

“The hardest part is how to get very accurate feedback, especially with reasoning tasks when you want intermediate feedback (before the entire response is generated),” said Pan.

Although some studies gave examples of specific tasks where LLMs were able to improve their performance from self-generated feedback, there were also many cases where their responses got worse. Pan and his colleagues suspect models exhibit something akin to narcissism where they favor what they generate, called self-bias.

The team followed up with another study to investigate the suspected self-bias. They analyzed six different LLMs, testing how they behaved in four different languages while performing three tasks: machine translation, text generation, and mathematical reasoning. While using a correction method called self-refinement, where the quality of a response is assessed through self-feedback and improved during a number of pre-defined iterations, they attempted to quantify self-bias in each model.  

They found that all models exhibited self-bias, regardless of the task and language, which affected their ability to optimize their responses. Although the output of a model often improved in terms of its wording, which became easier to understand, often the quality of the answer itself wasn’t any better. In some cases, generated text was rated more favorably if it mirrored the LLM’s style.

Furthermore, an LLM’s partiality was exacerbated during the optimization process. “[Self-bias] becomes stronger and stronger as you do more rounds of self-correction,” said Pan.

Pan and his colleagues have proposed ways of mitigating self-bias and, in turn, self-correction. In some of their experiments, they showed that larger models were less partial to their own output and were better at fixing their own mistakes, which suggests increasing the size of an LLM can be a potential solution.

Pan thinks a better theoretical understanding of self-correction is needed to develop more effective approaches; for example, probing what is happening to different parameters in an LLM while it is trying to self-correct could reveal new details about the process.

More in-depth knowledge should help uncover the limits of self-correction and whether it is impossible in certain scenarios. In addition, it could allow self-correction to be used in tasks such as generating open-ended dialogue, where it is hard to define what is a mistake and provide objective feedback. Until now, LLM self-correction has focused on tasks with well-defined answers, such as reasoning problems. “Application-wise, there is a lot of future work we can do,” said Pan.

Sandrine Ceurstemont is a freelance science writer based in London, U.K.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

大型语言模型 自我纠正 人工智能 自然语言处理 深度学习
相关文章