MarkTechPost@AI 2024年07月12日
Generalizable Reward Model (GRM): An Efficient AI Approach to Improve the Generalizability and Robustness of Reward Learning for LLMs
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

通用奖励模型 (GRM) 是一种新方法,旨在通过对奖励模型的隐藏状态进行正则化来提高大型语言模型 (LLM) 奖励学习的泛化能力和鲁棒性。研究表明,GRM 在各种超出分布 (OOD) 任务中显著提高了奖励模型的准确性,并有效地减少了 RLHF 中的过度优化问题。

😁 **GRM 的工作原理**:GRM 利用文本生成正则化来提升奖励模型的泛化能力。它通过在奖励模型的隐藏状态上应用正则化技术,迫使模型学习更具普遍性的表示,从而在面对新数据时表现出更好的泛化能力。

🤩 **GRM 的优势**:GRM 能够有效地解决奖励模型的过度优化问题,并对偏好数据中的标签噪声具有鲁棒性。此外,即使在数据量有限的情况下,GRM 也能取得优异的性能,显著超越基线模型。

🤔 **GRM 的意义**:GRM 的出现为构建更强大的奖励模型提供了新的思路,有助于更有效地对齐大型语言模型,并以更具成本效益的方式找到解决方案。

🚀 **GRM 的应用前景**:GRM 有望在各种领域得到应用,例如自动驾驶、机器人和医疗保健,帮助提升人工智能系统的安全性、可靠性和可解释性。

Pretrained large models have shown impressive abilities in many different fields. Recent research focuses on ensuring these models align with human values and avoid harmful behaviors. To achieve this, alignment methods are crucial, where two primary methods are supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF). RLHF is useful in generalizing the reward model to new prompt-response pairs. However, it faces the challenge of training a reward model that works well with unseen data. One common problem is “overoptimization” or “reward hacking”. Increasing the size of the reward model and the amount of training data can help solve this issue, but it is not practical in real-world situations.

This paper discusses two approaches in the related work. The first approach is Reward Modeling, where reward models are trained on human preference data to guide the RLHF process or prompt optimization. Recent research focuses on developing better reward models to improve the performance of large language models (LLMs) in RLHF. This includes enhancing reward modeling by improving the quality or quantity of preference data. The second approach is Mitigating Overoptimization in RLHF, where reward models often overfit and have trouble generalizing beyond the training data, leading to the issue of overoptimization. One can penalize overly confident model outputs using label smoothing or SFT regularization to reduce this problem.

Researchers from HKUST, Georgia Institute of Technology, and the University of Illinois Urbana-Champaign have introduced the Generalizable Reward Model (GRM), which uses text-generation regularization on hidden states to improve the performance of reward models. Their study shows that all three types of text-generation regularization work well with GRM, with SFT regularization being the most effective and reliable solution. The results demonstrate that GRM greatly enhances the accuracy of reward models in various out-of-distribution (OOD) tasks. Moreover,  it consistently boosts the performance of RLHF and helps in reducing the problem of overoptimization.

The Unified-Feedback dataset is used for training reward models, and it is one of the largest collections of pairwise feedback datasets. All reward models are trained on a subset of 400K and 40K instances from the Unified-Feedback dataset and evaluated on an 8K-instance hold-out eval set. Moreover, while evaluating model performance on OOD preference data, datasets like HHH-Alignment, MT-Bench Human Judgements, and RewardBench are used. The HHH-Alignment dataset evaluates language models on helpfulness, honesty, and harmlessness, while the MT-Bench dataset contains human preferences for model responses to MT-bench questions. 

Here are the results after evaluating GRM:

In conclusion, researchers have proposed the Generalizable Reward Model (GRM), an efficient method, that aims to improve the generalizability and robustness of reward learning for LLMs. GRM uses regularization techniques on the hidden states of reward models, which significantly improves the generalization performance of reward models for unseen data. Moreover, the proposed approach effectively reduces the problem of overoptimization in RLHF. These results will support future research in creating stronger reward models, helping to align large models more efficiently and solutions with cost-effectiveness.


Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter

Join our Telegram Channel and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our 46k+ ML SubReddit

The post Generalizable Reward Model (GRM): An Efficient AI Approach to Improve the Generalizability and Robustness of Reward Learning for LLMs appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

通用奖励模型 大型语言模型 奖励学习 泛化能力 鲁棒性
相关文章